You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Appreciate any help solving the issue...
(I've seeing in other threads people blaming this type of crashes on the CPU memory, but a g4dn.12xlarge has 192Gb RAM. So unless there's a hard threshold in the code, this should be plenty for my 1000/200 training dataset.)
Here is the error trace:
$:~/mistral-finetune$ torchrun --nproc-per-node 4 --master_port $RANDOM -m train example/7B.yaml
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING]
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING] *****************************************
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable foreach process to be 1in default, to avoid your system being overloaded, please further tune the variable foroptimal performancein your application as needed.
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING] *****************************************
args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 3
args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 0
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 0
2024-08-18 10:17:48 (UTC) - 0:00:04 - train - INFO - Going to init comms...
2024-08-18 10:17:48 (UTC) - 0:00:04 - train - INFO - Run dir: /home/usr/mistral-finetune/7B
args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 2
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 1
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 2
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 1
2024-08-18 10:17:49 (UTC) - 0:00:05 - train - INFO - Going to init comms...
2024-08-18 10:17:49 (UTC) - 0:00:05 - train - INFO - Going to init comms...
2024-08-18 10:17:49 (UTC) - 0:00:05 - train - INFO - Going to init comms...
2024-08-18 10:17:50 (UTC) - 0:00:06 - train - INFO - TrainArgs: {'batch_size': 1,
'checkpoint': True,
'ckpt_freq': 100,
'data': {'data': '',
'eval_instruct_data': 'kbp_validation_set_prepared_full_mistral.jsonl',
'instruct': {'dynamic_chunk_fn_call': True, 'shuffle': True},
'instruct_data': 'kbp_training_set_prepared_full_mistral.jsonl',
'shuffle': False},
'eval_freq': 100,
'log_freq': 1,
'lora': {'dropout': 0.0, 'enable': True, 'rank': 64, 'scaling': 2.0},
'max_norm': 1.0,
'max_steps': 300,
'mlflow': {'experiment_name': None, 'tracking_uri': None},
'model_id_or_path': '/home/usr/mistral-finetune/mistral_models',
'no_ckpt': False,
'no_eval': False,
'num_ckpt_keep': 3,
'num_microbatches': 1,
'optim': {'lr': 6e-05, 'pct_start': 0.05, 'weight_decay': 0.1},
'run_dir': '/home/usr/mistral-finetune/7B',
'save_adapters': True,
'seed': 0,
'seq_len': 32768,
'wandb': {'key': '',
'offline': True,
'project': 'csa-project',
'run_name': 'csa-run-1'},
'world_size': 4}
wandb: Currently logged in as: ll (ll-itu). Use `wandb login --relogin` to force relogin
2024-08-18 10:17:51 (UTC) - 0:00:07 - metrics_logger - INFO - initializing wandb
wandb: WARNING Changes to your `wandb` environment variables will be ignored because your `wandb` session has already started. For more information on how to modify your settings with `wandb.init()` arguments, please refer to https://wandb.me/wandb-init.wandb: Tracking run with wandb version 0.17.7wandb: Run data is saved locally in /home/usr/mistral-finetune/7B/wandb/run-20240818_101751-ebgqmgpjwandb: Run `wandb offline` to turn off syncing.wandb: Syncing run csa-run-1wandb: ⭐️ View project at https://wandb.ai/ll-itu/csa-projectwandb: 🚀 View run at https://wandb.ai/ll-itu/csa-project/runs/ebgqmgpjwandb: WARNING Calling wandb.login() after wandb.init() has no effect.2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Reloading model from /home/usr/mistral-finetune/mistral_models/consolidated.safetensors ...2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Converting model to dtype torch.bfloat16 ...2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Loaded model on cpu!2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Initializing lora layers ...2024-08-18 10:17:52 (UTC) - 0:00:08 - finetune.wrapped_model - INFO - Finished initialization!2024-08-18 10:17:52 (UTC) - 0:00:08 - finetune.wrapped_model - INFO - Sharding model over 4 GPUs ...2024-08-18 10:18:02 (UTC) - 0:00:18 - finetune.wrapped_model - INFO - Model sharded!2024-08-18 10:18:02 (UTC) - 0:00:18 - finetune.wrapped_model - INFO - 167,772,160 out of 7,415,795,712 parameters are finetuned (2.26%).2024-08-18 10:18:02 (UTC) - 0:00:18 - dataset - INFO - Loading kbp_training_set_prepared_full_mistral.jsonl ...2024-08-18 10:18:03 (UTC) - 0:00:19 - dataset - INFO - kbp_training_set_prepared_full_mistral.jsonl loaded and tokenized.2024-08-18 10:18:03 (UTC) - 0:00:19 - dataset - INFO - Shuffling kbp_training_set_prepared_full_mistral.jsonl ...2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_logger2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_logger2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: eval_logger2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: metrics_logger2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: metrics_loggerTraceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_mainreturn _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/usr/mistral-finetune/train.py", line 327, in<module> fire.Fire(train) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/usr/mistral-finetune/train.py", line 64, in train _train(args, exit_stack) File "/home/usr/mistral-finetune/train.py", line 243, in _train output = model( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_logger2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: eval_logger2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: metrics_logger2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: metrics_loggerreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_implTraceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_mainreturn _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_loggerreturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: eval_logger exec(code, run_globals) File "/home/usr/mistral-finetune/train.py", line 327, in<module>2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: metrics_logger2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: metrics_loggerTraceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_mainfire.Fire(train) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire output = self._fsdp_wrapped_module(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Firereturn _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/usr/mistral-finetune/train.py", line 327, in<module> component, remaining_args = _CallAndUpdateTrace( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace fire.Fire(train)return self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component = fn(*varargs, **kwargs) File "/home/usr/mistral-finetune/train.py", line 64, in train _train(args, exit_stack) component, remaining_args = _CallAndUpdateTrace( File "/home/usr/mistral-finetune/train.py", line 243, in _train File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTracereturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forward output = model( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl h = layer(h, freqs_cis, att_mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl component = fn(*varargs, **kwargs) File "/home/usr/mistral-finetune/train.py", line 64, in train _train(args, exit_stack) File "/home/usr/mistral-finetune/train.py", line 243, in _train output = model( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_implreturn self._call_impl(*args, **kwargs)return self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_implreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_implreturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forwardreturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forwardreturn self.checkpoint_fn( # type: ignore[misc] File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in innerreturn torch._dynamo.disable(fn, recursive)(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fnreturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forwardreturn fn(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inneroutput = self._fsdp_wrapped_module(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_implreturn fn(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint output = self._fsdp_wrapped_module(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl ret = function(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_implreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_implreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_implreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_implreturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forwardreturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forward h = layer(h, freqs_cis, att_mask)h = layer(h, freqs_cis, att_mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_implreturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward output = self._fsdp_wrapped_module(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_implreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_implreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_implreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_implreturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forwardreturn self.checkpoint_fn( # type: ignore[misc]return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forwardreturn torch._dynamo.disable(fn, recursive)(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fnreturn self.checkpoint_fn( # type: ignore[misc] File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in innerreturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forwardreturn torch._dynamo.disable(fn, recursive)(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fnreturn fn(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner r = self.attention(self.attention_norm(x), freqs_cis, att_mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_implreturn fn(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpointreturn fn(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner ret = function(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_implreturn fn(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpointreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl ret = function(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_implreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_implreturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forwardreturn self._call_impl(*args, **kwargs) output = memory_efficient_attention(xq, key, val, mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attentionreturn forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forwardreturn _memory_efficient_attention( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attentionreturn _fMHA.apply( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply output = self._fsdp_wrapped_module(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_implreturn forward_call(*args, **kwargs)returnsuper().apply(*args, **kwargs) # type: ignore[misc] File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward out, op_ctx = _memory_efficient_attention_forward_requires_grad( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad op = _dispatch_fw(inp, True) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fwreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl output = self._fsdp_wrapped_module(*args, **kwargs)return _run_priority_list( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list raise NotImplementedError(msg)NotImplementedError: No operator found for`memory_efficient_attention_forward` with inputs: query : shape=(1, 32768, 32, 128) (torch.bfloat16) key : shape=(1, 32768, 32, 128) (torch.bfloat16) value : shape=(1, 32768, 32, 128) (torch.bfloat16) attn_bias :<class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> p : 0.0`[email protected]` is not supported because: requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) bf16 is only supported on A100+ GPUs`tritonflashattF` is not supported because: requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> bf16 is only supported on A100+ GPUs operator wasn't built - see `python -m xformers.info` for more info triton is not available requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4`cutlassF` is not supported because: bf16 is only supported on A100+ GPUs`smallkF` is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 dtype=torch.bfloat16 (supported: {torch.float32}) attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> bf16 is only supported on A100+ GPUs unsupported embed per head: 128 return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forward r = self.attention(self.attention_norm(x), freqs_cis, att_mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_implreturn self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forward r = self.attention(self.attention_norm(x), freqs_cis, att_mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forward output = memory_efficient_attention(xq, key, val, mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention return self._call_impl(*args, **kwargs)return _memory_efficient_attention( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attention return _fMHA.apply( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward out, op_ctx = _memory_efficient_attention_forward_requires_grad( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forward op = _dispatch_fw(inp, True) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw output = memory_efficient_attention(xq, key, val, mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention return _run_priority_list( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list return _memory_efficient_attention(raise NotImplementedError(msg) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attentionNotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs: query : shape=(1, 32768, 32, 128) (torch.bfloat16) key : shape=(1, 32768, 32, 128) (torch.bfloat16) value : shape=(1, 32768, 32, 128) (torch.bfloat16) attn_bias : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> p : 0.0`[email protected]` is not supported because: requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) bf16 is only supported on A100+ GPUs`tritonflashattF` is not supported because: requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> bf16 is only supported on A100+ GPUs operator wasn't built - see `python -m xformers.info`for more info triton is not available requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4`cutlassF` is not supported because: bf16 is only supported on A100+ GPUs`smallkF` is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 dtype=torch.bfloat16 (supported: {torch.float32}) attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> bf16 is only supported on A100+ GPUs unsupported embed per head: 128return _fMHA.apply( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in applyreturnsuper().apply(*args, **kwargs) # type: ignore[misc] File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward out, op_ctx = _memory_efficient_attention_forward_requires_grad( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad op = _dispatch_fw(inp, True) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fwreturn _run_priority_list( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list raise NotImplementedError(msg)NotImplementedError: No operator found for`memory_efficient_attention_forward` with inputs: query : shape=(1, 32768, 32, 128) (torch.bfloat16) key : shape=(1, 32768, 32, 128) (torch.bfloat16) value : shape=(1, 32768, 32, 128) (torch.bfloat16) attn_bias :<class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> p : 0.0`[email protected]` is not supported because: requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) bf16 is only supported on A100+ GPUs`tritonflashattF` is not supported because: requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> bf16 is only supported on A100+ GPUs operator wasn't built - see `python -m xformers.info` for more info triton is not available requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4`cutlassF` is not supported because: bf16 is only supported on A100+ GPUs`smallkF` is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 dtype=torch.bfloat16 (supported: {torch.float32}) attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> bf16 is only supported on A100+ GPUs unsupported embed per head: 128wandb:wandb: 🚀 View run csa-run-1 at: https://wandb.ai/ll-itu/csa-project/runs/ebgqmgpjwandb: ⭐️ View project at: https://wandb.ai/ll-itu/csa-projectwandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)wandb: Find logs at: ./7B/wandb/run-20240818_101751-ebgqmgpj/logswandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.2024-08-18 10:18:07 (UTC) - 0:00:24 - utils - INFO - Closed: eval_logger2024-08-18 10:18:07 (UTC) - 0:00:24 - utils - INFO - Closing: metrics_logger2024-08-18 10:18:07 (UTC) - 0:00:24 - utils - INFO - Closed: metrics_loggerTraceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/usr/mistral-finetune/train.py", line 327, in <module> fire.Fire(train) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire component, remaining_args = _CallAndUpdateTrace( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/home/usr/mistral-finetune/train.py", line 64, in train _train(args, exit_stack) File "/home/usr/mistral-finetune/train.py", line 243, in _train output = model( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward output = self._fsdp_wrapped_module(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forward h = layer(h, freqs_cis, att_mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward return self.checkpoint_fn( # type: ignore[misc] File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner return torch._dynamo.disable(fn, recursive)(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn return fn(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner return fn(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint ret = function(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward output = self._fsdp_wrapped_module(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forward r = self.attention(self.attention_norm(x), freqs_cis, att_mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl return forward_call(*args, **kwargs) File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forward output = memory_efficient_attention(xq, key, val, mask) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention return _memory_efficient_attention( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attention return _fMHA.apply( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply return super().apply(*args, **kwargs) # type: ignore[misc] File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward out, op_ctx = _memory_efficient_attention_forward_requires_grad( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad op = _dispatch_fw(inp, True) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw return _run_priority_list( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list raise NotImplementedError(msg)NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs: query : shape=(1, 32768, 32, 128) (torch.bfloat16) key : shape=(1, 32768, 32, 128) (torch.bfloat16) value : shape=(1, 32768, 32, 128) (torch.bfloat16) attn_bias : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> p : 0.0`[email protected]` is not supported because: requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) bf16 is only supported on A100+ GPUs`tritonflashattF` is not supported because: requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old) attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> bf16 is only supported on A100+ GPUs operator wasn't built - see `python -m xformers.info`for more info triton is not available requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4`cutlassF` is not supported because: bf16 is only supported on A100+ GPUs`smallkF` is not supported because: max(query.shape[-1] != value.shape[-1]) > 32 dtype=torch.bfloat16 (supported: {torch.float32}) attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'> bf16 is only supported on A100+ GPUs unsupported embed per head: 128[2024-08-18 10:18:08,858] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 6934 closing signal SIGTERM[2024-08-18 10:18:10,074] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 1 (pid: 6935) of binary: /home/usr/mistral-finetune/mistenv/bin/python3.10Traceback (most recent call last): File "/home/usr/mistral-finetune/mistenv/bin/torchrun", line 8, in<module>sys.exit(main()) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapperreturn f(*args, **kwargs) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main run(args) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run elastic_launch( File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__return launch_agent(self._config, self._entrypoint, list(args)) File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent raise ChildFailedError(torch.distributed.elastic.multiprocessing.errors.ChildFailedError:============================================================train FAILED------------------------------------------------------------Failures:[1]:time: 2024-08-18_10:18:08 host : ip-172-31-x-x.ec2.internal rank : 2 (local_rank: 2) exitcode : 1 (pid: 6936) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html[2]:time: 2024-08-18_10:18:08 host : ip-172-31-x-x.ec2.internal rank : 3 (local_rank: 3) exitcode : 1 (pid: 6937) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html------------------------------------------------------------Root Cause (first observed failure):[0]:time: 2024-08-18_10:18:08 host : ip-172-31-x-x.ec2.internal rank : 1 (local_rank: 1) exitcode : 1 (pid: 6935) error_file: <N/A> traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html============================================================$:~/mistral-finetune$
Update: This seems to be an issue with lack of (individual) GPU memory which I could get around by reducing the context length in .yaml for a smaller training footprint. I couldn't find anything suggesting that in your error messages though.
Python Version
Pip Freeze
$:~/mistral-finetune$ pip3 freeze absl-py==2.1.0 annotated-types==0.7.0 attrs==24.2.0 certifi==2024.7.4 charset-normalizer==3.3.2 click==8.1.7 docker-pycreds==0.4.0 docstring_parser==0.16 filelock==3.15.4 fire==0.6.0 fsspec==2024.6.1 gitdb==4.0.11 GitPython==3.1.43 grpcio==1.65.5 idna==3.7 Jinja2==3.1.4 jsonschema==4.23.0 jsonschema-specifications==2023.12.1 Markdown==3.7 MarkupSafe==2.1.5 mistral_common==1.3.4 mpmath==1.3.0 networkx==3.3 numpy==1.25.2 nvidia-cublas-cu12==12.1.3.1 nvidia-cuda-cupti-cu12==12.1.105 nvidia-cuda-nvrtc-cu12==12.1.105 nvidia-cuda-runtime-cu12==12.1.105 nvidia-cudnn-cu12==8.9.2.26 nvidia-cufft-cu12==11.0.2.54 nvidia-curand-cu12==10.3.2.106 nvidia-cusolver-cu12==11.4.5.107 nvidia-cusparse-cu12==12.1.0.106 nvidia-nccl-cu12==2.19.3 nvidia-nvjitlink-cu12==12.6.20 nvidia-nvtx-cu12==12.1.105 packaging==24.1 platformdirs==4.2.2 protobuf==5.27.3 psutil==6.0.0 pydantic==2.8.2 pydantic_core==2.20.1 PyYAML==6.0.2 referencing==0.35.1 regex==2024.7.24 requests==2.32.3 rpds-py==0.20.0 safetensors==0.4.4 sentencepiece==0.2.0 sentry-sdk==2.13.0 setproctitle==1.3.3 simple_parsing==0.1.5 six==1.16.0 smmap==5.0.1 sympy==1.13.2 tensorboard==2.17.1 tensorboard-data-server==0.7.2 termcolor==2.4.0 tiktoken==0.7.0 torch==2.2.0 tqdm==4.66.5 triton==2.2.0 typing_extensions==4.12.2 urllib3==2.2.2 wandb==0.17.7 Werkzeug==3.0.3 xformers==0.0.24
Reproduction Steps
Expected Behavior
A successful training session.
Additional Context
No response
Suggested Solutions
No response
The text was updated successfully, but these errors were encountered: