Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] (Duplicate Flag Generation)_ __main__.py: error: unrecognized arguments: --mixed_precision bf16 -m autotrain.trainers.clm #797

Open
2 tasks done
unclemusclez opened this issue Oct 19, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@unclemusclez
Copy link

Prerequisites

  • I have read the documentation.
  • I have checked other issues for similar problems.

Backend

Local

Interface Used

CLI

CLI Command

autotrain app --host 0.0.0.0 --port 7000

UI Screenshots & Parameters

No response

Error Logs

__main__.py: error: unrecognized arguments: --mixed_precision bf16 -m autotrain.trainers.clm --mixed_precision bf16 -m autotrain.trainers.clm --mixed_precision fp16 -m autotrain.trainers.clm --mixed_precision fp16 -m autotrain.trainers.clm

INFO     | 2024-10-19 23:01:18 | autotrain.commands:launch_command:524 - {'model': 'unsloth/Qwen2.5-Coder-7B-Instruct', 'project_name': 'autotrain-126tb-pvpyu4', 'data_path': 'skratos115/opendevin_DataDevinator', 'train_split': 'train', 'valid_split': None, 'add_eos_token': True, 'block_size': 2048, 'model_max_length': 2048, 'padding': 'right', 'trainer': 'sft', 'use_flash_attention_2': False, 'log': 'tensorboard', 'disable_gradient_checkpointing': False, 'logging_steps': -1, 'eval_strategy': 'epoch', 'save_total_limit': 1, 'auto_find_batch_size': False, 'mixed_precision': 'fp16', 'lr': 1e-06, 'epochs': 1, 'batch_size': 1, 'warmup_ratio': 0.1, 'gradient_accumulation': 4, 'optimizer': 'adamw_torch', 'scheduler': 'linear', 'weight_decay': 0.0, 'max_grad_norm': 1.0, 'seed': 42, 'chat_template': 'none', 'quantization': 'int4', 'target_modules': 'all-linear', 'merge_adapter': False, 'peft': True, 'lora_r': 16, 'lora_alpha': 32, 'lora_dropout': 0.05, 'model_ref': None, 'dpo_beta': 0.1, 'max_prompt_length': 128, 'max_completion_length': None, 'prompt_text_column': 'prompt', 'text_column': 'text', 'rejected_text_column': 'rejected_text', 'push_to_hub': True, 'username': 'unclemusclez', 'token': '*****', 'unsloth': True, 'distributed_backend': 'none'}
INFO     | 2024-10-19 23:01:18 | autotrain.backends.local:create:25 - Training PID: 57326
INFO:     192.168.2.69:65250 - "POST /ui/create_project HTTP/1.1" 200 OK
INFO:     192.168.2.69:65250 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO:     192.168.2.69:65250 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO:     192.168.2.69:65250 - "GET /ui/accelerators HTTP/1.1" 200 OK
usage: __main__.py [-h] --training_config TRAINING_CONFIG
__main__.py: error: unrecognized arguments: --mixed_precision bf16 -m autotrain.trainers.clm --mixed_precision bf16 -m autotrain.trainers.clm --mixed_precision fp16 -m autotrain.trainers.clm --mixed_precision fp16 -m autotrain.trainers.clm
Traceback (most recent call last):
  File "/usr/local/open-webui/.venv/bin/accelerate", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/open-webui/.venv/lib/python3.12/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/usr/local/open-webui/.venv/lib/python3.12/site-packages/accelerate/commands/launch.py", line 1174, in launch_command
    simple_launcher(args)
  File "/usr/local/open-webui/.venv/lib/python3.12/site-packages/accelerate/commands/launch.py", line 769, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/local/open-webui/.venv/bin/python', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-126tb-pvpyu3/training_params.json', '--mixed_precision', 'bf16', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-126tb-pvpyu3/training_params.json', '--mixed_precision', 'bf16', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-126tb-pvpyu3/training_params.json', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-126tb-pvpyu4/training_params.json', '--mixed_precision', 'fp16', '-m', 'autotrain.trainers.clm', '--training_config', 'autotrain-126tb-pvpyu4/training_params.json']' returned non-zero exit status 2.
INFO:     192.168.2.69:65250 - "GET /ui/is_model_training HTTP/1.1" 200 OK
INFO     | 2024-10-19 23:01:34 | autotrain.app.utils:get_running_jobs:40 - Killing PID: 57326
INFO     | 2024-10-19 23:01:34 | autotrain.app.utils:kill_process_by_pid:90 - Sent SIGTERM to process with PID 57326

Additional Information

Running Local and it seems to double up the flags, and then keep doing so every time the training is run.

@unclemusclez unclemusclez added the bug Something isn't working label Oct 19, 2024
@unclemusclez unclemusclez changed the title [BUG] __main__.py: error: unrecognized arguments: --mixed_precision bf16 -m autotrain.trainers.clm - [BUG] (Duplicate Flag Generation)_ __main__.py: error: unrecognized arguments: --mixed_precision bf16 -m autotrain.trainers.clm Oct 19, 2024
@abhishekkrthakur
Copy link
Member

is it an issue with the latest version? 🤔

@unclemusclez
Copy link
Author

this occurs when a job fails and you try to run the job again when running the same instance. at the moment, the only solution i have found is turn off the application and turn it back on.

this was from the recent origin main but i noticed this same issue a week or so ago as well with a non-updated version.

It also seems to be particular to SFT training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants