Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Issue]: DeepCache with cpu offload generates distorted images after first batch #2888

Closed
2 tasks done
ledrose opened this issue Feb 19, 2024 · 1 comment
Closed
2 tasks done

Comments

@ledrose
Copy link

ledrose commented Feb 19, 2024

Issue Description

Found a problem while testing DeepCache on dev branch. If you enable model cpu offload first image will be generated as expected (3x speed up, expected quality), but all next batches will be distorted and have 9x speed up. It does not happen with sequential cpu offload and with no offload. The problem exist with all tested pipelines. I was able to figure out that this is problem in DeepCache and created issue on it's github (#horseee/DeepCache#23) with recreation of it on kaggle.
There is also a possible temporary fix for this problem. If you move enabling and disabling deepcache_worker from model compilation to pipeline execution (enable before and disable after) you can fix this issue.

Example of first 2 generated batches:
00946-stable-diffusion-xl-base-1 0-beautiful forest-grid

Version Platform Description

Python 3.10.13 on Linux
Version: app=sd.next updated=2024-02-18 hash=be81d486 url=https://github.com/vladmandic/automatic.git/tree/dev
Latest published version: 9c12b74 2024-02-18T22:40:13Z
Platform: arch=x86_64 cpu= system=Linux release=6.6.17-1-lts python=3.10.13
AMD ROCm toolkit detected
ROCm agents detected: ['gfx1032', 'gfx90c']
ROCm agent used by default: idx=0 gpu=gfx1032 arch=navi2x
ROCm version detected: 6.0
Load packages: {'torch': '2.3.0.dev20240218+rocm6.0', 'diffusers': '0.26.3', 'gradio': '3.43.2'}
Backend.DIFFUSERS compute=rocm device=cuda attention="Scaled-Dot-Product" mode=no_grad
Device: device=AMD Radeon RX 6600M n=1 hip=6.0.32830-d62f6a171

Relevant log output

22:42:23-699737 DEBUG    Load model: existing=False target=models/Diffusers/models--stabilityai--stable-diffusion-xl-base-1.0/snapshots/462165984030d82259a11f4367a4eed129e94a7b info=None                                                  
22:42:24-057115 DEBUG    Desired Torch parameters: dtype=FP16 no-half=False no-half-vae=False upscast=False                                                                                                                                 
22:42:24-058276 INFO     Setting Torch parameters: device=cuda dtype=torch.float16 vae=torch.float16 unet=torch.float16 context=no_grad fp16=True bf16=None                                                                                 
22:42:24-059113 INFO     Loading VAE: model=models/VAE/sdxl_vae.safetensors source=settings                                                                                                                                                 
22:42:24-059804 DEBUG    Diffusers VAE load config: {'low_cpu_mem_usage': False, 'torch_dtype': torch.float16, 'use_safetensors': True, 'variant': 'fp16'}                                                                                  
22:42:24-060589 INFO     Autodetect: vae="Stable Diffusion XL" class=StableDiffusionXLPipeline file="models/Diffusers/models--stabilityai--stable-diffusion-xl-base-1.0/snapshots/462165984030d82259a11f4367a4eed129e94a7b" size=0MB        
22:42:24-466941 DEBUG    Diffusers loading: path="models/Diffusers/models--stabilityai--stable-diffusion-xl-base-1.0/snapshots/462165984030d82259a11f4367a4eed129e94a7b"                                                                    
22:42:24-468026 INFO     Autodetect: model="Stable Diffusion XL" class=StableDiffusionXLPipeline file="models/Diffusers/models--stabilityai--stable-diffusion-xl-base-1.0/snapshots/462165984030d82259a11f4367a4eed129e94a7b" size=0MB      
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:01<00:00,  4.23it/s]
22:42:26-127056 DEBUG    Setting model VAE: name=sdxl_vae.safetensors                                                                                                                                                                       
22:42:26-127994 DEBUG    Setting model: enable model CPU offload                                                                                                                                                                            
22:42:26-152328 INFO     Model compile: pipeline=StableDiffusionXLPipeline mode=default backend=deep-cache fullgraph=True compile=['Model']                                                                                                 
22:42:26-156748 INFO     Model compile: task=DeepCache config={'cache_interval': 3, 'cache_layer_id': 0, 'cache_block_id': 0, 'skip_mode': 'uniform'} time=0.00                                                                             
22:42:26-182472 INFO     Load embeddings: loaded=0 skipped=11 time=0.02                                                                                                                                                                     
22:42:26-474572 DEBUG    GC: collected=204 device=cuda {'ram': {'used': 1.45, 'total': 15.01}, 'gpu': {'used': 0.17, 'total': 7.98}, 'retries': 0, 'oom': 0} time=0.29                                                                      
22:42:26-483255 INFO     Load model: time=2.48 load=2.48 native=1024 {'ram': {'used': 1.45, 'total': 15.01}, 'gpu': {'used': 0.17, 'total': 7.98}, 'retries': 0, 'oom': 0}                                                                  
22:42:26-484947 DEBUG    Save: file="config.json" json=28 bytes=1090 time=0.000                                                                                                                                                             
22:42:26-485766 DEBUG    Unused settings: ['onnx_show_menu']                                                                                                                                                                                
22:42:26-486554 DEBUG    Script callback init time: image_browser.py:ui_tabs=1.44 system-info.py:app_started=0.07 task_scheduler.py:app_started=0.15                                                                                        
22:42:26-487325 INFO     Startup time: 16.24 torch=6.55 olive=0.41 gradio=1.16 libraries=1.16 extensions=0.30 ui-en=0.24 ui-txt2img=0.07 ui-img2img=0.09 ui-control=0.13 ui-settings=0.72 ui-extensions=1.60 ui-defaults=0.07 launch=0.43   
                         api=0.08 app-started=0.23 checkpoint=2.79                                                                                                                                                                          
22:42:37-380619 DEBUG    Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE set={'prompt_embeds': torch.Size([1, 77, 2048]), 'pooled_prompt_embeds': torch.Size([1, 1280]), 'negative_prompt_embeds':         
                         torch.Size([1, 77, 2048]), 'negative_pooled_prompt_embeds': torch.Size([1, 1280]), 'guidance_scale': 8, 'generator': device(type='cuda'), 'output_type': 'latent', 'num_inference_steps': 30, 'eta': 1.0,          
                         'guidance_rescale': 0.7, 'denoising_end': None, 'width': 1024, 'height': 1024, 'parser': 'Full parser'}                                                                                                            
22:42:37-457995 DEBUG    Sampler: sampler="Euler a" config={'num_train_timesteps': 1000, 'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon', 'rescale_betas_zero_snr': False}         
Progress  4.43s/it ████▌                               13% 4/30 00:22 01:55 Base22:43:00-105193 DEBUG    VAE load: type=approximate model=models/VAE-approx/model.pt                                                                                                                                                        
Progress  1.27s/it █████████████████████████████████ 100% 30/30 00:38 00:00 Base
22:43:20-765553 INFO     Saving: image="outputs/text/10469-stable-diffusion-xl-base-1.0-beautiful forest.jpg" type=JPEG resolution=1024x1024 size=0                                                                                         
22:43:21-522970 DEBUG    Diffuser pipeline: StableDiffusionXLPipeline task=DiffusersTaskType.TEXT_2_IMAGE set={'prompt_embeds': torch.Size([1, 77, 2048]), 'pooled_prompt_embeds': torch.Size([1, 1280]), 'negative_prompt_embeds':         
                         torch.Size([1, 77, 2048]), 'negative_pooled_prompt_embeds': torch.Size([1, 1280]), 'guidance_scale': 8, 'generator': device(type='cuda'), 'output_type': 'latent', 'num_inference_steps': 30, 'eta': 1.0,          
                         'guidance_rescale': 0.7, 'denoising_end': None, 'width': 1024, 'height': 1024, 'parser': 'Full parser'}                                                                                                            
22:43:21-568679 DEBUG    Sampler: sampler="Euler a" config={'num_train_timesteps': 1000, 'beta_start': 0.00085, 'beta_end': 0.012, 'beta_schedule': 'scaled_linear', 'prediction_type': 'epsilon', 'rescale_betas_zero_snr': False}         
Progress 14.21it/s █████████████████████████████████ 100% 30/30 00:02 00:00 Base
22:43:24-698841 INFO     High memory utilization: GPU=86% RAM=50% {'ram': {'used': 7.43, 'total': 15.01}, 'gpu': {'used': 6.83, 'total': 7.98}, 'retries': 0, 'oom': 0}                                                                     
22:43:25-360550 DEBUG    GC: collected=163 device=cuda {'ram': {'used': 7.46, 'total': 15.01}, 'gpu': {'used': 1.18, 'total': 7.98}, 'retries': 0, 'oom': 0} time=0.66                                                                      
22:43:27-347425 INFO     Saving: image="outputs/text/10470-stable-diffusion-xl-base-1.0-beautiful forest.jpg" type=JPEG resolution=1024x1024 size=0                                                                                         
22:43:27-356450 INFO     High memory utilization: GPU=100% RAM=50% {'ram': {'used': 7.45, 'total': 15.01}, 'gpu': {'used': 7.95, 'total': 7.98}, 'retries': 0, 'oom': 0}                                                                    
22:43:27-670999 DEBUG    GC: collected=124 device=cuda {'ram': {'used': 7.45, 'total': 15.01}, 'gpu': {'used': 1.18, 'total': 7.98}, 'retries': 0, 'oom': 0} time=0.31                                                                      
22:43:27-672861 INFO     Processed: images=2 time=55.28 its=1.09 memory={'ram': {'used': 7.45, 'total': 15.01}, 'gpu': {'used': 1.18, 'total': 7.98}, 'retries': 0, 'oom': 0}                                                               
22:43:27-691607 INFO     Saving: image="outputs/grids/00946-stable-diffusion-xl-base-1.0-beautiful forest-grid.jpg" type=JPEG resolution=2048x1024 size=0                                                                                   
22:43:59-708367 DEBUG    Server: alive=True jobs=1 requests=87 uptime=101 memory=7.43/15.01 backend=Backend.DIFFUSERS state=idle

Backend

Diffusers

Branch

Dev

Model

SD 1.5

Acknowledgements

  • I have read the above and searched for existing issues
  • I confirm that this is classified correctly and its not an extension issue
@vladmandic
Copy link
Owner

i've implemented suggested workaround for now.

frostyplanet pushed a commit to frostyplanet/inference that referenced this issue Sep 14, 2024
test result shows deepcache should be loaded before cpu_offloading
if not, may cause issues like
horseee/DeepCache#23
vladmandic/automatic#2888

Signed-off-by: wxiwnd <[email protected]>
frostyplanet pushed a commit to frostyplanet/inference that referenced this issue Sep 14, 2024
test result shows deepcache should be loaded before cpu_offloading
if not, may cause issues like
horseee/DeepCache#23
vladmandic/automatic#2888

Signed-off-by: wxiwnd <[email protected]>
frostyplanet pushed a commit to frostyplanet/inference that referenced this issue Sep 21, 2024
test result shows deepcache should be loaded before cpu_offloading
if not, may cause issues like
horseee/DeepCache#23
vladmandic/automatic#2888

Signed-off-by: wxiwnd <[email protected]>
frostyplanet pushed a commit to frostyplanet/inference that referenced this issue Sep 27, 2024
test result shows deepcache should be loaded before cpu_offloading
if not, may cause issues like
horseee/DeepCache#23
vladmandic/automatic#2888

Signed-off-by: wxiwnd <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants