-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train_loss = 0 and Eval_loss = NaN in stage2_sft #31
Comments
Hi, is your training data very small? Maybe you can use a larger training data? |
I think this code seems to work fine on my data. Ideally, except for the part of the model response, the targets corresponding to other tokens will be set to -100, which means no loss is calculated. We do this because it seems to be a common practice for fine-tuning instructions, but we actually tried not to do this and directly calculate the loss on the entire sequence, and I don’t think there is much difference |
Your problem seems similar to this issue: lm-sys/FastChat#3266 (comment) , and I hope my solution can assist you. You can modify AnyGPT/anygpt/src/train/stage2_sft.py Lines 257 to 260 in 282b58f
to the following lines: for i, turn in enumerate(turns):
if turn == "":
break
turn += conv.sep2 # append the sep2 to turn
turn_len = len(tokenizer(turn).input_ids) - 1 # subtract the length of the sos token from the turn length |
Hello!
Thank you for your work at MLLM.
I had a fine-tuning bug that I couldn't fix: when I ran the
stage2_sft.sh
script and trained with speech_conv_datasets only, the logger showed that the train loss was 0 all the time and eval loss was NaN, as shown in the figure.Command in
stage2_sft.sh
as follows:I'm using the following python environment:
The text was updated successfully, but these errors were encountered: