You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The following values were not passed to accelerate launch and had defaults used instead: --num_processes was set to a value of 1 --num_machines was set to a value of 1 --mixed_precision was set to a value of 'no' --dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
2025-01-13 21:54:00.044139: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-13 21:54:00.282656: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-13 21:54:00.353328: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-13 21:54:00.753650: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-13 21:54:02.855868: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:root:All CogVideoX models except cogvideox-2b were trained with bfloat16. Using fp16 precision may lead to training instability.
INFO:trainer:Initialized Trainer
INFO:trainer:Accelerator state:
Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
INFO:trainer:Initializing models
You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Downloading shards: 100% 4/4 [00:00<00:00, 1194.36it/s]
Loading checkpoint shards: 100% 4/4 [00:01<00:00, 2.21it/s]
Fetching 3 files: 100% 3/3 [00:00<00:00, 6250.83it/s]
{'ofs_embed_dim'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1168, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 763, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/drive/MyDrive/train/CogVideo/finetune/train.py', '--model_path', 'THUDM/CogVideoX1.5-5B', '--model_name', 'cogvideox1.5-t2v', '--model_type', 't2v', '--training_type', 'lora', '--output_dir', '/content/drive/MyDrive/train', '--report_to', 'tensorboard', '--data_root', '/content/drive/MyDrive/train/Disney-VideoGeneration-Dataset', '--caption_column', 'prompt.txt', '--video_column', 'videos.txt', '--train_resolution', '49x480x720', '--train_epochs', '10', '--batch_size', '1', '--gradient_accumulation_steps', '1', '--mixed_precision', 'fp16', '--seed', '42', '--num_workers', '2', '--pin_memory', 'False', '--nccl_timeout', '3600', '--checkpointing_steps', '200', '--checkpointing_limit', '10']' died with <Signals.SIGKILL: 9>.
The text was updated successfully, but these errors were encountered:
We have not tested our code in the colab environment, so there might be some compatibility issues. Therefore, we recommend performing fine-tuning in a standard runtime environment.
System Info / 系統信息
Google Colab
Information / 问题信息
Reproduction / 复现过程
Run finetuning text2video in google colab
Expected behavior / 期待表现
The following values were not passed to
accelerate launch
and had defaults used instead:--num_processes
was set to a value of1
--num_machines
was set to a value of1
--mixed_precision
was set to a value of'no'
--dynamo_backend
was set to a value of'no'
To avoid this warning pass in values for each of the problematic parameters or run
accelerate config
.2025-01-13 21:54:00.044139: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-13 21:54:00.282656: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-13 21:54:00.353328: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-13 21:54:00.753650: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-13 21:54:02.855868: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:root:All CogVideoX models except cogvideox-2b were trained with bfloat16. Using fp16 precision may lead to training instability.
INFO:trainer:Initialized Trainer
INFO:trainer:Accelerator state:
Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda
Mixed precision type: fp16
INFO:trainer:Initializing models
You set
add_prefix_space
. The tokenizer needs to be converted from the slow tokenizersDownloading shards: 100% 4/4 [00:00<00:00, 1194.36it/s]
Loading checkpoint shards: 100% 4/4 [00:01<00:00, 2.21it/s]
Fetching 3 files: 100% 3/3 [00:00<00:00, 6250.83it/s]
{'ofs_embed_dim'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1168, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 763, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/drive/MyDrive/train/CogVideo/finetune/train.py', '--model_path', 'THUDM/CogVideoX1.5-5B', '--model_name', 'cogvideox1.5-t2v', '--model_type', 't2v', '--training_type', 'lora', '--output_dir', '/content/drive/MyDrive/train', '--report_to', 'tensorboard', '--data_root', '/content/drive/MyDrive/train/Disney-VideoGeneration-Dataset', '--caption_column', 'prompt.txt', '--video_column', 'videos.txt', '--train_resolution', '49x480x720', '--train_epochs', '10', '--batch_size', '1', '--gradient_accumulation_steps', '1', '--mixed_precision', 'fp16', '--seed', '42', '--num_workers', '2', '--pin_memory', 'False', '--nccl_timeout', '3600', '--checkpointing_steps', '200', '--checkpointing_limit', '10']' died with <Signals.SIGKILL: 9>.
The text was updated successfully, but these errors were encountered: