Getting following error on google colab #667

smktech9 · 2025-01-13T22:02:11Z

System Info / 系統信息

Google Colab

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

Run finetuning text2video in google colab

Expected behavior / 期待表现

The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--mixed_precision was set to a value of 'no'
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
2025-01-13 21:54:00.044139: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:485] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-13 21:54:00.282656: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:8454] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-13 21:54:00.353328: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1452] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-13 21:54:00.753650: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-13 21:54:02.855868: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:root:All CogVideoX models except cogvideox-2b were trained with bfloat16. Using fp16 precision may lead to training instability.
INFO:trainer:Initialized Trainer
INFO:trainer:Accelerator state:
Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

INFO:trainer:Initializing models
You set add_prefix_space. The tokenizer needs to be converted from the slow tokenizers
Downloading shards: 100% 4/4 [00:00<00:00, 1194.36it/s]
Loading checkpoint shards: 100% 4/4 [00:01<00:00, 2.21it/s]
Fetching 3 files: 100% 3/3 [00:00<00:00, 6250.83it/s]
{'ofs_embed_dim'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 48, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1168, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 763, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/drive/MyDrive/train/CogVideo/finetune/train.py', '--model_path', 'THUDM/CogVideoX1.5-5B', '--model_name', 'cogvideox1.5-t2v', '--model_type', 't2v', '--training_type', 'lora', '--output_dir', '/content/drive/MyDrive/train', '--report_to', 'tensorboard', '--data_root', '/content/drive/MyDrive/train/Disney-VideoGeneration-Dataset', '--caption_column', 'prompt.txt', '--video_column', 'videos.txt', '--train_resolution', '49x480x720', '--train_epochs', '10', '--batch_size', '1', '--gradient_accumulation_steps', '1', '--mixed_precision', 'fp16', '--seed', '42', '--num_workers', '2', '--pin_memory', 'False', '--nccl_timeout', '3600', '--checkpointing_steps', '200', '--checkpointing_limit', '10']' died with <Signals.SIGKILL: 9>.

The text was updated successfully, but these errors were encountered:

OleehyO · 2025-01-14T04:56:33Z

We have not tested our code in the colab environment, so there might be some compatibility issues. Therefore, we recommend performing fine-tuning in a standard runtime environment.

OleehyO self-assigned this Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Getting following error on google colab #667

Getting following error on google colab #667

smktech9 commented Jan 13, 2025

OleehyO commented Jan 14, 2025

Getting following error on google colab #667

Getting following error on google colab #667

Comments

smktech9 commented Jan 13, 2025

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

OleehyO commented Jan 14, 2025