Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

This is a fix to get stable_txt2img working on an M1 Mac. #36

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

beettlle
Copy link

Running with more than one sample seems to break it so I'm just running multiple itterations to get the regularization images:
python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 1 --n_iter 200 --scale 10.0 --ddim_steps 50 --ckpt ~/Downloads/sd-v1-4-full-ema.ckpt --prompt "a photo of a <class>

- Install Stable Diffusion as per https://medium.com/gft-engineering/macbook-m1-how-to-install-and-run-stable-diffusion-7bfb2f802b1a
- Install PyTorch by running `conda install pytorch torchvision torchaudio -c pytorch-nightly`

Running with more than one sample seems to break it so I'm just running multiple itterations to get the regularization images.
`python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 1 --n_iter 200 --scale 10.0 --ddim_steps 50 --ckpt ~/Downloads/sd-v1-4-full-ema.ckpt --prompt "a photo of a <class>`
@swankwc
Copy link

swankwc commented Sep 28, 2022

How long is it taking you to train the models this way?

@Sorrow
Copy link

Sorrow commented Sep 28, 2022

I can't get this repo (not the lstein one mentionned by OP) to train on M1.
I was able to patch my way out until i didn't get any visible errors, but inevitably got stuck on training never progressing (epoch 0)

@beettlle
Copy link
Author

@swankwc I haven't gotten to training yet. ATM his patch is just forstable_txt2img.py which took 1484.16s user 6134.54s system 24% cpu 8:30:55.51 total.

I'm having problems getting main.py to run. Even if I comment out all the CUDA code, and change the Trainer to MPS I'm still getting a CUDA error in trainer.fit

Traceback (most recent call last):
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/main.py", line 836, in <module>
    trainer.fit(model, data)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
    self._call_and_handle_interrupt(
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
    results = self._run_stage()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
    self._run_sanity_check()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
    val_loop.run()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 128, in advance
    output = self._evaluation_step(**kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 226, in _evaluation_step
    output = self.trainer._call_strategy_hook("validation_step", *kwargs.values())
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 344, in validation_step
    return self.model.validation_step(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 368, in validation_step
    _, loss_dict_no_ema = self.shared_step(batch)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 908, in shared_step
    loss = self(x, c)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 937, in forward
    c = self.get_learned_conditioning(c)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 595, in get_learned_conditioning
    c = self.cond_stage_model.encode(c, embedding_manager=self.embedding_manager)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 324, in encode
    return self(text, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 318, in forward
    tokens = batch_encoding["input_ids"].to(self.device)        
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/cuda/__init__.py", line 221, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")

I can keep using this PR to track that work or open a new one. Opinions?

@beettlle
Copy link
Author

beettlle commented Sep 28, 2022

@Sorrow do you have your work somewhere? You seem to have gotten further than me. Maybe we can collaborate. Here's my WIP, it's very rough ATM https://github.com/beettlle/Dreambooth-Stable-Diffusion/tree/m1-training-fix

@beettlle beettlle changed the title This is a fix to get this working on an M1 Mac. This is a fix to get stable_txt2img working on an M1 Mac. Sep 28, 2022
@beettlle
Copy link
Author

Renamed PR explain scope of work better.

@SujeethJinesh
Copy link

@beettlle, I've been able to get it up and running on my Macbook Pro with some modifications using your code. It's linked here if you'd like to take a look: https://github.com/SujeethJinesh/DreamBoothMac

@beettlle
Copy link
Author

beettlle commented Nov 2, 2022

That's awesome @SujeethJinesh ! Let me reset my env and I'll try it tomorrow.

@beettlle
Copy link
Author

beettlle commented Nov 2, 2022

@SujeethJinesh I'm still getting the following error with your branch. Any ideas?

% python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml -t --actual_resume ~/Downloads/sd-v1-4-full-ema.ckpt -n ramona --gpus 0, --data_root ~/Downloads/ramona --reg_data_root outputs/txt2img-samples --class_word ramona
<gobs and gobs of stuff>
Traceback (most recent call last):
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/main.py", line 806, in <module>
    trainer.fit(model, data)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
    self._call_and_handle_interrupt(
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
    results = self._run(model, ckpt_path=self.ckpt_path)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1236, in _run
    results = self._run_stage()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1323, in _run_stage
    return self._run_train()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1345, in _run_train
    self._run_sanity_check()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1413, in _run_sanity_check
    val_loop.run()
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 155, in advance
    dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/base.py", line 204, in run
    self.advance(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 128, in advance
    output = self._evaluation_step(**kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 226, in _evaluation_step
    output = self.trainer._call_strategy_hook("validation_step", *kwargs.values())
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/trainer/trainer.py", line 1765, in _call_strategy_hook
    output = fn(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/pytorch_lightning/strategies/strategy.py", line 344, in validation_step
    return self.model.validation_step(*args, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 368, in validation_step
    _, loss_dict_no_ema = self.shared_step(batch)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 908, in shared_step
    loss = self(x, c)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 937, in forward
    c = self.get_learned_conditioning(c)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/models/diffusion/ddpm.py", line 595, in get_learned_conditioning
    c = self.cond_stage_model.encode(c, embedding_manager=self.embedding_manager)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 324, in encode
    return self(text, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 319, in forward
    z = self.transformer(input_ids=tokens, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 297, in transformer_forward
    return self.text_model(
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 258, in text_encoder_forward
    hidden_states = self.embeddings(input_ids=input_ids, position_ids=position_ids, embedding_manager=embedding_manager)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/Documents/github/Dreambooth-Stable-Diffusion/ldm/modules/encoders/modules.py", line 180, in embedding_forward
    inputs_embeds = self.token_embedding(input_ids)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/modules/sparse.py", line 160, in forward
    return F.embedding(
  File "/Users/cdelgado/miniforge3/envs/ldm/lib/python3.10/site-packages/torch/nn/functional.py", line 2206, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Placeholder storage has not been allocated on MPS device!

@HannesGitH
Copy link

is there any progress (in @SujeethJinesh's built)?

i cant even generate the regularization images on MPS as it doesnt support double precision floats, but sd requires them Cannot convert a MPS Tensor to float64 dtype as the MPS framework doesn't support float64. Please use float32 instead.

@alberto-salinas
Copy link

I tried running @SujeethJinesh 's repo and got the same error as you @HannesGitH . Some additional things I was to install the following

conda install pytorch torchvision torchaudio -c pytorch-nightly
conda install chardet

@alberto-salinas
Copy link

My latest attempt to fix was to perform the cast as follows

class DDIMSampler(object):
    def __init__(self, model, schedule="linear", **kwargs):
        super().__init__()
        self.model = model
        self.ddpm_num_timesteps = model.num_timesteps
        self.schedule = schedule

    def register_buffer(self, name, attr):
        if type(attr) == torch.Tensor:
            if attr.device != torch.device("mps"):
                attr = attr.to(torch.device("mps"), torch.float32)
        setattr(self, name, attr)

but that gave me this error

AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion `[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'               | 0/5 [00:00<?, ?it/s]
[1]    1493 abort      python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 2 --n_iter 1  10.
/Users/jose-rs/anaconda3/envs/ldm-mac/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

@alberto-salinas
Copy link

@SujeethJinesh open a PR SujeethJinesh/DreamBoothMac#3 to fix the float64 error.

I was able to get around the error

AppleInternal/Library/BuildRoots/c651a45f-806e-11ed-a221-7ef33c48bc85/Library/Caches/com.apple.xbs/Sources/MetalPerformanceShaders/MPSCore/Types/MPSNDArray.mm:705: failed assertion `[MPSTemporaryNDArray initWithDevice:descriptor:] Error: product of dimension sizes > 2**31'               | 0/5 [00:00<?, ?it/s]
[1]    1493 abort      python scripts/stable_txt2img.py --ddim_eta 0.0 --n_samples 2 --n_iter 1  10.
/Users/jose-rs/anaconda3/envs/ldm-mac/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown

I changed the size of the image to 256 x 256 that did the trick. It unblocks me for now, but it would be good to figure out a better solution. I will try to fix later.

@alberto-salinas
Copy link

In my latest attempt I tried to perform training

 python main.py --base configs/stable-diffusion/v1-finetune_unfrozen.yaml  -t --actual_resume ~/Downloads/sd-v1-4-full-ema.ckpt -n hello_world --gpus 0, --data_root ~/Downloads/couch_images --reg_data_root ~/Downloads/other_images/ --class_word couch_trainversion_314

I get the following error

pytorch_lightning.utilities.exceptions.MisconfigurationException: You passed `devices=1` but haven't specified `accelerator=('auto'|'tpu'|'gpu'|'ipu'|'cpu')` for the devices mapping, got `accelerator='mps'`.

My best guess is that the pytorch lighting version specified (1.5.9) does have this feature

https://lightning.ai/docs/pytorch/stable/accelerators/mps_basic.html

@SujeethJinesh how did you get this work?

@beettlle
Copy link
Author

beettlle commented May 4, 2023

@alberto-salinas would you mind trying the following from 's site in your environment to see if MPS is supported?

import torch
if torch.backends.mps.is_available():
    mps_device = torch.device("mps")
    x = torch.ones(1, device=mps_device)
    print (x)
else:
    print ("MPS device not found.")

Output should be:
tensor([1.], device='mps:0')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants