Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG: Instruction FT (train.py) crashes on AWS g4dn.12xlarge (4 T4s) without precise indication of reasons] #92

Open
leloss opened this issue Aug 19, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@leloss
Copy link

leloss commented Aug 19, 2024

Python Version

Appreciate any help solving the issue...
(I've seeing in other threads people blaming this type of crashes on the CPU memory, but a g4dn.12xlarge has 192Gb RAM. So unless there's a hard threshold in the code, this should be plenty for my 1000/200 training dataset.)

Here is the error trace:

$:~/mistral-finetune$ torchrun --nproc-per-node 4 --master_port $RANDOM -m train example/7B.yaml
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING]
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING] *****************************************
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
[2024-08-18 10:17:43,828] torch.distributed.run: [WARNING] *****************************************
args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 3
args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 0
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 0
2024-08-18 10:17:48 (UTC) - 0:00:04 - train - INFO - Going to init comms...
2024-08-18 10:17:48 (UTC) - 0:00:04 - train - INFO - Run dir: /home/usr/mistral-finetune/7B
args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))args: TrainArgs(data=DataArgs(data='', shuffle=False, instruct_data='kbp_training_set_prepared_full_mistral.jsonl', eval_instruct_data='kbp_validation_set_prepared_full_mistral.jsonl', instruct=InstructArgs(shuffle=True, dynamic_chunk_fn_call=True)), model_id_or_path='/home/usr/mistral-finetune/mistral_models', run_dir='/home/usr/mistral-finetune/7B', optim=OptimArgs(lr=6e-05, weight_decay=0.1, pct_start=0.05), seed=0, num_microbatches=1, seq_len=32768, batch_size=1, max_norm=1.0, max_steps=300, log_freq=1, ckpt_freq=100, save_adapters=True, no_ckpt=False, num_ckpt_keep=3, eval_freq=100, no_eval=False, checkpoint=True, world_size=4, wandb=WandbArgs(project='csa-project', offline=True, key='', run_name='csa-run-1'), mlflow=MLFlowArgs(tracking_uri=None, experiment_name=None), lora=LoraArgs(enable=True, rank=64, dropout=0.0, scaling=2.0))

2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - torch.cuda.device_count: 4
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - CUDA_VISIBLE_DEVICES: 0,1,2,3
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 2
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - local rank: 1
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 2
2024-08-18 10:17:48 (UTC) - 0:00:04 - distributed - INFO - Set cuda device to 1
2024-08-18 10:17:49 (UTC) - 0:00:05 - train - INFO - Going to init comms...
2024-08-18 10:17:49 (UTC) - 0:00:05 - train - INFO - Going to init comms...
2024-08-18 10:17:49 (UTC) - 0:00:05 - train - INFO - Going to init comms...
2024-08-18 10:17:50 (UTC) - 0:00:06 - train - INFO - TrainArgs: {'batch_size': 1,
 'checkpoint': True,
 'ckpt_freq': 100,
 'data': {'data': '',
          'eval_instruct_data': 'kbp_validation_set_prepared_full_mistral.jsonl',
          'instruct': {'dynamic_chunk_fn_call': True, 'shuffle': True},
          'instruct_data': 'kbp_training_set_prepared_full_mistral.jsonl',
          'shuffle': False},
 'eval_freq': 100,
 'log_freq': 1,
 'lora': {'dropout': 0.0, 'enable': True, 'rank': 64, 'scaling': 2.0},
 'max_norm': 1.0,
 'max_steps': 300,
 'mlflow': {'experiment_name': None, 'tracking_uri': None},
 'model_id_or_path': '/home/usr/mistral-finetune/mistral_models',
 'no_ckpt': False,
 'no_eval': False,
 'num_ckpt_keep': 3,
 'num_microbatches': 1,
 'optim': {'lr': 6e-05, 'pct_start': 0.05, 'weight_decay': 0.1},
 'run_dir': '/home/usr/mistral-finetune/7B',
 'save_adapters': True,
 'seed': 0,
 'seq_len': 32768,
 'wandb': {'key': '',
           'offline': True,
           'project': 'csa-project',
           'run_name': 'csa-run-1'},
 'world_size': 4}
wandb: Currently logged in as: ll (ll-itu). Use `wandb login --relogin` to force relogin
2024-08-18 10:17:51 (UTC) - 0:00:07 - metrics_logger - INFO - initializing wandb
wandb: WARNING Changes to your `wandb` environment variables will be ignored because your `wandb` session has already started. For more information on how to modify your settings with `wandb.init()` arguments, please refer to https://wandb.me/wandb-init.
wandb: Tracking run with wandb version 0.17.7
wandb: Run data is saved locally in /home/usr/mistral-finetune/7B/wandb/run-20240818_101751-ebgqmgpj
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run csa-run-1
wandb: ⭐️ View project at https://wandb.ai/ll-itu/csa-project
wandb: 🚀 View run at https://wandb.ai/ll-itu/csa-project/runs/ebgqmgpj
wandb: WARNING Calling wandb.login() after wandb.init() has no effect.
2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Reloading model from /home/usr/mistral-finetune/mistral_models/consolidated.safetensors ...
2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Converting model to dtype torch.bfloat16 ...
2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Loaded model on cpu!
2024-08-18 10:17:51 (UTC) - 0:00:07 - finetune.wrapped_model - INFO - Initializing lora layers ...
2024-08-18 10:17:52 (UTC) - 0:00:08 - finetune.wrapped_model - INFO - Finished initialization!
2024-08-18 10:17:52 (UTC) - 0:00:08 - finetune.wrapped_model - INFO - Sharding model over 4 GPUs ...
2024-08-18 10:18:02 (UTC) - 0:00:18 - finetune.wrapped_model - INFO - Model sharded!
2024-08-18 10:18:02 (UTC) - 0:00:18 - finetune.wrapped_model - INFO - 167,772,160 out of 7,415,795,712 parameters are finetuned (2.26%).
2024-08-18 10:18:02 (UTC) - 0:00:18 - dataset - INFO - Loading kbp_training_set_prepared_full_mistral.jsonl ...
2024-08-18 10:18:03 (UTC) - 0:00:19 - dataset - INFO - kbp_training_set_prepared_full_mistral.jsonl loaded and tokenized.
2024-08-18 10:18:03 (UTC) - 0:00:19 - dataset - INFO - Shuffling kbp_training_set_prepared_full_mistral.jsonl ...
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: eval_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: metrics_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: metrics_logger
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/usr/mistral-finetune/train.py", line 327, in <module>
    fire.Fire(train)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/usr/mistral-finetune/train.py", line 64, in train
    _train(args, exit_stack)
  File "/home/usr/mistral-finetune/train.py", line 243, in _train
    output = model(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: eval_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: metrics_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: metrics_logger
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: eval_logger
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: eval_logger
    exec(code, run_globals)
  File "/home/usr/mistral-finetune/train.py", line 327, in <module>
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closing: metrics_logger
2024-08-18 10:18:04 (UTC) - 0:00:20 - utils - INFO - Closed: metrics_logger
Traceback (most recent call last):
      File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
fire.Fire(train)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/usr/mistral-finetune/train.py", line 327, in <module>
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
        fire.Fire(train)return self._call_impl(*args, **kwargs)

  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component = fn(*varargs, **kwargs)
  File "/home/usr/mistral-finetune/train.py", line 64, in train
    _train(args, exit_stack)
    component, remaining_args = _CallAndUpdateTrace(  File "/home/usr/mistral-finetune/train.py", line 243, in _train

  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forward
    output = model(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    h = layer(h, freqs_cis, att_mask)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    component = fn(*varargs, **kwargs)
  File "/home/usr/mistral-finetune/train.py", line 64, in train
    _train(args, exit_stack)
  File "/home/usr/mistral-finetune/train.py", line 243, in _train
    output = model(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
return self._call_impl(*args, **kwargs)  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl

  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
    return self.checkpoint_fn(  # type: ignore[misc]
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
    return fn(*args, **kwargs)
      File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return fn(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    ret = function(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forward
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forward
        h = layer(h, freqs_cis, att_mask)h = layer(h, freqs_cis, att_mask)

  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward
    return self.checkpoint_fn(  # type: ignore[misc]
return forward_call(*args, **kwargs)  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner

  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return self.checkpoint_fn(  # type: ignore[misc]
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forward
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    r = self.attention(self.attention_norm(x), freqs_cis, att_mask)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return fn(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint
    return fn(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    ret = function(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return fn(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    ret = function(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forward
    return self._call_impl(*args, **kwargs)
    output = memory_efficient_attention(xq, key, val, mask)  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl

  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
    return _memory_efficient_attention(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attention
    return _fMHA.apply(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return forward_call(*args, **kwargs)
return super().apply(*args, **kwargs)  # type: ignore[misc]  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward

  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward
    out, op_ctx = _memory_efficient_attention_forward_requires_grad(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad
    op = _dispatch_fw(inp, True)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    output = self._fsdp_wrapped_module(*args, **kwargs)
return _run_priority_list(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list
    raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 32768, 32, 128) (torch.bfloat16)
     key         : shape=(1, 32768, 32, 128) (torch.bfloat16)
     value       : shape=(1, 32768, 32, 128) (torch.bfloat16)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
     p           : 0.0
`[email protected]` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    bf16 is only supported on A100+ GPUs
`tritonflashattF` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
    bf16 is only supported on A100+ GPUs
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
    requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
`cutlassF` is not supported because:
    bf16 is only supported on A100+ GPUs
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    dtype=torch.bfloat16 (supported: {torch.float32})
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
    bf16 is only supported on A100+ GPUs
    unsupported embed per head: 128
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forward
    r = self.attention(self.attention_norm(x), freqs_cis, att_mask)
      File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forward
    r = self.attention(self.attention_norm(x), freqs_cis, att_mask)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forward
    output = memory_efficient_attention(xq, key, val, mask)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention
    return self._call_impl(*args, **kwargs)
return _memory_efficient_attention(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attention
    return _fMHA.apply(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward
    out, op_ctx = _memory_efficient_attention_forward_requires_grad(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forward
    op = _dispatch_fw(inp, True)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw
    output = memory_efficient_attention(xq, key, val, mask)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention
    return _run_priority_list(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list
        return _memory_efficient_attention(raise NotImplementedError(msg)

  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attention
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 32768, 32, 128) (torch.bfloat16)
     key         : shape=(1, 32768, 32, 128) (torch.bfloat16)
     value       : shape=(1, 32768, 32, 128) (torch.bfloat16)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
     p           : 0.0
`[email protected]` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    bf16 is only supported on A100+ GPUs
`tritonflashattF` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
    bf16 is only supported on A100+ GPUs
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
    requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
`cutlassF` is not supported because:
    bf16 is only supported on A100+ GPUs
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    dtype=torch.bfloat16 (supported: {torch.float32})
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
    bf16 is only supported on A100+ GPUs
    unsupported embed per head: 128
    return _fMHA.apply(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward
    out, op_ctx = _memory_efficient_attention_forward_requires_grad(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad
    op = _dispatch_fw(inp, True)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw
    return _run_priority_list(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list
    raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 32768, 32, 128) (torch.bfloat16)
     key         : shape=(1, 32768, 32, 128) (torch.bfloat16)
     value       : shape=(1, 32768, 32, 128) (torch.bfloat16)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
     p           : 0.0
`[email protected]` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    bf16 is only supported on A100+ GPUs
`tritonflashattF` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
    bf16 is only supported on A100+ GPUs
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
    requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
`cutlassF` is not supported because:
    bf16 is only supported on A100+ GPUs
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    dtype=torch.bfloat16 (supported: {torch.float32})
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
    bf16 is only supported on A100+ GPUs
    unsupported embed per head: 128
wandb:
wandb: 🚀 View run csa-run-1 at: https://wandb.ai/ll-itu/csa-project/runs/ebgqmgpj
wandb: ⭐️ View project at: https://wandb.ai/ll-itu/csa-project
wandb: Synced 5 W&B file(s), 0 media file(s), 0 artifact file(s) and 0 other file(s)
wandb: Find logs at: ./7B/wandb/run-20240818_101751-ebgqmgpj/logs
wandb: WARNING The new W&B backend becomes opt-out in version 0.18.0; try it out with `wandb.require("core")`! See https://wandb.me/wandb-core for more information.
2024-08-18 10:18:07 (UTC) - 0:00:24 - utils - INFO - Closed: eval_logger
2024-08-18 10:18:07 (UTC) - 0:00:24 - utils - INFO - Closing: metrics_logger
2024-08-18 10:18:07 (UTC) - 0:00:24 - utils - INFO - Closed: metrics_logger
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/usr/mistral-finetune/train.py", line 327, in <module>
    fire.Fire(train)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 143, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 477, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/fire/core.py", line 693, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/usr/mistral-finetune/train.py", line 64, in train
    _train(args, exit_stack)
  File "/home/usr/mistral-finetune/train.py", line 243, in _train
    output = model(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 220, in forward
    h = layer(h, freqs_cis, att_mask)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py", line 168, in forward
    return self.checkpoint_fn(  # type: ignore[misc]
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner
    return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 489, in _fn
    return fn(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 17, in inner
    return fn(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 489, in checkpoint
    ret = function(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/fsdp/fully_sharded_data_parallel.py", line 849, in forward
    output = self._fsdp_wrapped_module(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 146, in forward
    r = self.attention(self.attention_norm(x), freqs_cis, att_mask)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/usr/mistral-finetune/model/transformer.py", line 87, in forward
    output = memory_efficient_attention(xq, key, val, mask)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 223, in memory_efficient_attention
    return _memory_efficient_attention(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 326, in _memory_efficient_attention
    return _fMHA.apply(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/autograd/function.py", line 553, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 42, in forward
    out, op_ctx = _memory_efficient_attention_forward_requires_grad(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/__init__.py", line 351, in _memory_efficient_attention_forward_requires_grad
    op = _dispatch_fw(inp, True)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 120, in _dispatch_fw
    return _run_priority_list(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/xformers/ops/fmha/dispatch.py", line 63, in _run_priority_list
    raise NotImplementedError(msg)
NotImplementedError: No operator found for `memory_efficient_attention_forward` with inputs:
     query       : shape=(1, 32768, 32, 128) (torch.bfloat16)
     key         : shape=(1, 32768, 32, 128) (torch.bfloat16)
     value       : shape=(1, 32768, 32, 128) (torch.bfloat16)
     attn_bias   : <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
     p           : 0.0
`[email protected]` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    bf16 is only supported on A100+ GPUs
`tritonflashattF` is not supported because:
    requires device with capability > (8, 0) but your GPU has capability (7, 5) (too old)
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
    bf16 is only supported on A100+ GPUs
    operator wasn't built - see `python -m xformers.info` for more info
    triton is not available
    requires GPU with sm80 minimum compute capacity, e.g., A100/H100/L4
`cutlassF` is not supported because:
    bf16 is only supported on A100+ GPUs
`smallkF` is not supported because:
    max(query.shape[-1] != value.shape[-1]) > 32
    dtype=torch.bfloat16 (supported: {torch.float32})
    attn_bias type is <class 'xformers.ops.fmha.attn_bias.BlockDiagonalCausalMask'>
    bf16 is only supported on A100+ GPUs
    unsupported embed per head: 128
[2024-08-18 10:18:08,858] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 6934 closing signal SIGTERM
[2024-08-18 10:18:10,074] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 1 (pid: 6935) of binary: /home/usr/mistral-finetune/mistenv/bin/python3.10
Traceback (most recent call last):
  File "/home/usr/mistral-finetune/mistenv/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/usr/mistral-finetune/mistenv/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
train FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-08-18_10:18:08
  host      : ip-172-31-x-x.ec2.internal
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 6936)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2024-08-18_10:18:08
  host      : ip-172-31-x-x.ec2.internal
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 6937)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-08-18_10:18:08
  host      : ip-172-31-x-x.ec2.internal
  rank      : 1 (local_rank: 1)
  exitcode  : 1 (pid: 6935)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
$:~/mistral-finetune$

Pip Freeze

$:~/mistral-finetune$ pip3 freeze
absl-py==2.1.0
annotated-types==0.7.0
attrs==24.2.0
certifi==2024.7.4
charset-normalizer==3.3.2
click==8.1.7
docker-pycreds==0.4.0
docstring_parser==0.16
filelock==3.15.4
fire==0.6.0
fsspec==2024.6.1
gitdb==4.0.11
GitPython==3.1.43
grpcio==1.65.5
idna==3.7
Jinja2==3.1.4
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
Markdown==3.7
MarkupSafe==2.1.5
mistral_common==1.3.4
mpmath==1.3.0
networkx==3.3
numpy==1.25.2
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.19.3
nvidia-nvjitlink-cu12==12.6.20
nvidia-nvtx-cu12==12.1.105
packaging==24.1
platformdirs==4.2.2
protobuf==5.27.3
psutil==6.0.0
pydantic==2.8.2
pydantic_core==2.20.1
PyYAML==6.0.2
referencing==0.35.1
regex==2024.7.24
requests==2.32.3
rpds-py==0.20.0
safetensors==0.4.4
sentencepiece==0.2.0
sentry-sdk==2.13.0
setproctitle==1.3.3
simple_parsing==0.1.5
six==1.16.0
smmap==5.0.1
sympy==1.13.2
tensorboard==2.17.1
tensorboard-data-server==0.7.2
termcolor==2.4.0
tiktoken==0.7.0
torch==2.2.0
tqdm==4.66.5
triton==2.2.0
typing_extensions==4.12.2
urllib3==2.2.2
wandb==0.17.7
Werkzeug==3.0.3
xformers==0.0.24

Reproduction Steps

  1. Install libraries and dependences
  2. export CUDA_VISIBLE_DEVICES=0,1,2,3
  3. configure absolute paths in 7B.yaml file
  4. pass dataset validation test
  5. torchrun --nproc-per-node 4 --master_port $RANDOM -m train example/7B.yaml

Expected Behavior

A successful training session.

Additional Context

No response

Suggested Solutions

No response

@leloss leloss added the bug Something isn't working label Aug 19, 2024
@leloss
Copy link
Author

leloss commented Aug 20, 2024

Update: This seems to be an issue with lack of (individual) GPU memory which I could get around by reducing the context length in .yaml for a smaller training footprint. I couldn't find anything suggesting that in your error messages though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant