We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
실행 셀 코드:
!CUDA_VISIBLE_DEVICES=0 python seq2seq_finetune_t5_ynat.py \ --do_train --do_eval --predict_with_generate \ --model_name_or_path /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5 \ --data_dir /content/drive/MyDrive/ET5_test/ynat-v1.1 \ --output_dir /content/drive/MyDrive/ET5_test/output \ --overwrite_output_dir \ --save_steps 100000 \ --per_device_train_batch_size 16 \ --gradient_accumulation_steps 1 \ --num_train_epochs 1.0
오류 메세지:
12/01/2021 05:14:34 - WARNING - __main__ - Process rank: -1, device: cuda:0, n_gpu: 1, distributed training: False, 16-bits training: False 12/01/2021 05:14:34 - INFO - __main__ - Training/evaluation parameters Seq2SeqTrainingArguments(output_dir='/content/drive/MyDrive/ET5_test/output', overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=<EvaluationStrategy.NO: 'no'>, prediction_loss_only=False, per_device_train_batch_size=16, per_device_eval_batch_size=8, per_gpu_train_batch_size=None, per_gpu_eval_batch_size=None, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=1.0, max_steps=-1, lr_scheduler_type=<SchedulerType.LINEAR: 'linear'>, warmup_steps=0, logging_dir='runs/Dec01_05-14-34_eb535e39a1b5', logging_first_step=False, logging_steps=500, save_steps=100000, save_total_limit=None, no_cuda=False, seed=42, fp16=False, fp16_opt_level='O1', fp16_backend='auto', local_rank=-1, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name='/content/drive/MyDrive/ET5_test/output', disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=False, deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, report_to=['tensorboard'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, label_smoothing=0.0, sortish_sampler=False, predict_with_generate=True, encoder_layerdrop=None, decoder_layerdrop=None, dropout=None, attention_dropout=None, lr_scheduler='linear') [INFO|configuration_utils.py:447] 2021-12-01 05:14:34,267 >> loading configuration file /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5/config.json [INFO|configuration_utils.py:485] 2021-12-01 05:14:34,267 >> Model config T5Config { "architectures": [ "T5ForConditionalGeneration" ], "d_ff": 3072, "d_kv": 64, "d_model": 768, "decoder_start_token_id": 0, "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "gated-gelu", "initializer_factor": 1.0, "is_encoder_decoder": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "num_decoder_layers": 12, "num_heads": 12, "num_layers": 12, "pad_token_id": 0, "relative_attention_num_buckets": 32, "tie_word_embeddings": false, "tokenizer_class": "T5Tokenizer", "transformers_version": "4.3.2", "use_cache": true, "vocab_size": 45100 } [INFO|configuration_utils.py:447] 2021-12-01 05:14:34,269 >> loading configuration file /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5/config.json [INFO|configuration_utils.py:485] 2021-12-01 05:14:34,269 >> Model config T5Config { "architectures": [ "T5ForConditionalGeneration" ], "d_ff": 3072, "d_kv": 64, "d_model": 768, "decoder_start_token_id": 0, "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "gated-gelu", "initializer_factor": 1.0, "is_encoder_decoder": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "num_decoder_layers": 12, "num_heads": 12, "num_layers": 12, "pad_token_id": 0, "relative_attention_num_buckets": 32, "tie_word_embeddings": false, "tokenizer_class": "T5Tokenizer", "transformers_version": "4.3.2", "use_cache": true, "vocab_size": 45100 } [INFO|tokenization_utils_base.py:1688] 2021-12-01 05:14:34,269 >> Model name '/content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5' not found in model shortcut name list (t5-small, t5-base, t5-large, t5-3b, t5-11b). Assuming '/content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5' is a path, a model identifier, or url to a directory containing tokenizer files. [INFO|tokenization_utils_base.py:1721] 2021-12-01 05:14:34,271 >> Didn't find file /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5/tokenizer.json. We won't load it. [INFO|tokenization_utils_base.py:1721] 2021-12-01 05:14:34,271 >> Didn't find file /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5/added_tokens.json. We won't load it. [INFO|tokenization_utils_base.py:1721] 2021-12-01 05:14:34,272 >> Didn't find file /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5/special_tokens_map.json. We won't load it. [INFO|tokenization_utils_base.py:1784] 2021-12-01 05:14:34,273 >> loading file /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5/spiece.model [INFO|tokenization_utils_base.py:1784] 2021-12-01 05:14:34,273 >> loading file None [INFO|tokenization_utils_base.py:1784] 2021-12-01 05:14:34,273 >> loading file None [INFO|tokenization_utils_base.py:1784] 2021-12-01 05:14:34,273 >> loading file None [INFO|tokenization_utils_base.py:1784] 2021-12-01 05:14:34,273 >> loading file /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5/tokenizer_config.json [INFO|modeling_utils.py:1025] 2021-12-01 05:14:34,456 >> loading weights file /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5/pytorch_model.bin [INFO|modeling_utils.py:1143] 2021-12-01 05:14:42,207 >> All model checkpoint weights were used when initializing T5ForConditionalGeneration. [INFO|modeling_utils.py:1152] 2021-12-01 05:14:42,207 >> All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5. If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training. ##### Reading an input file ... /content/drive/MyDrive/ET5_test/ynat-v1.1/train.json ##### Create examples ... : 45678it [00:00, 666217.22it/s] ##### Get source and target texts ... : 100% 45677/45677 [00:00<00:00, 1432044.61it/s] ##### Reading an input file ... /content/drive/MyDrive/ET5_test/ynat-v1.1/val.json ##### Create examples ... : 9107it [00:00, 694942.72it/s] ##### Get source and target texts ... : 100% 9106/9106 [00:00<00:00, 1586826.72it/s] 12/01/2021 05:14:48 - INFO - __main__ - *** Train *** /usr/local/lib/python3.7/dist-packages/transformers/trainer.py:705: FutureWarning: `model_path` is deprecated and will be removed in a future version. Use `resume_from_checkpoint` instead. FutureWarning, [INFO|trainer.py:724] 2021-12-01 05:14:48,220 >> Loading model from /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5). [INFO|configuration_utils.py:447] 2021-12-01 05:14:48,222 >> loading configuration file /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5/config.json [INFO|configuration_utils.py:485] 2021-12-01 05:14:48,222 >> Model config T5Config { "architectures": [ "T5ForConditionalGeneration" ], "d_ff": 3072, "d_kv": 64, "d_model": 768, "decoder_start_token_id": 0, "dropout_rate": 0.1, "eos_token_id": 1, "feed_forward_proj": "gated-gelu", "initializer_factor": 1.0, "is_encoder_decoder": true, "layer_norm_epsilon": 1e-06, "model_type": "t5", "num_decoder_layers": 12, "num_heads": 12, "num_layers": 12, "pad_token_id": 0, "relative_attention_num_buckets": 32, "tie_word_embeddings": false, "tokenizer_class": "T5Tokenizer", "transformers_version": "4.3.2", "use_cache": true, "vocab_size": 45100 } [INFO|modeling_utils.py:1025] 2021-12-01 05:14:48,224 >> loading weights file /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5/pytorch_model.bin [INFO|modeling_utils.py:1143] 2021-12-01 05:14:55,663 >> All model checkpoint weights were used when initializing T5ForConditionalGeneration. [INFO|modeling_utils.py:1152] 2021-12-01 05:14:55,663 >> All the weights of T5ForConditionalGeneration were initialized from the model checkpoint at /content/drive/MyDrive/cakd3_3차프로젝트_2조/Datasets/ETRI_ET5. If your task is similar to the task the model of the checkpoint was trained on, you can already use T5ForConditionalGeneration for predictions without further training. [INFO|trainer.py:837] 2021-12-01 05:14:56,744 >> ***** Running training ***** [INFO|trainer.py:838] 2021-12-01 05:14:56,744 >> Num examples = 45676 [INFO|trainer.py:839] 2021-12-01 05:14:56,744 >> Num Epochs = 1 [INFO|trainer.py:840] 2021-12-01 05:14:56,744 >> Instantaneous batch size per device = 16 [INFO|trainer.py:841] 2021-12-01 05:14:56,744 >> Total train batch size (w. parallel, distributed & accumulation) = 16 [INFO|trainer.py:842] 2021-12-01 05:14:56,744 >> Gradient Accumulation steps = 1 [INFO|trainer.py:843] 2021-12-01 05:14:56,744 >> Total optimization steps = 2855 0% 0/2855 [00:00<?, ?it/s]Traceback (most recent call last): File "seq2seq_finetune_t5_ynat.py", line 379, in <module> main() File "seq2seq_finetune_t5_ynat.py", line 316, in main model_path=model_args.model_name_or_path if os.path.isdir(model_args.model_name_or_path) else None File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 940, in train tr_loss += self.training_step(model, inputs) File "/usr/local/lib/python3.7/dist-packages/transformers/trainer.py", line 1320, in training_step loss.backward() File "/usr/local/lib/python3.7/dist-packages/torch/tensor.py", line 245, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/usr/local/lib/python3.7/dist-packages/torch/autograd/__init__.py", line 147, in backward allow_unreachable=True, accumulate_grad=True) # allow_unreachable flag RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)` 0% 0/2855 [00:00<?, ?it/s]
The text was updated successfully, but these errors were encountered:
비슷한 이슈 allenai/allennlp#5064 (comment)
Sorry, something went wrong.
CUDA에러 발생 이유: CUDA / 사용 라이브러리 버전이 맞지 않는 경우 혹은 입력 데이터 형식이 이상할 경우
No branches or pull requests
실행 셀 코드:
오류 메세지:
The text was updated successfully, but these errors were encountered: