-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set prepend_bos to false by default for Bloom model family #775
base: main
Are you sure you want to change the base?
Set prepend_bos to false by default for Bloom model family #775
Conversation
Interestingly, adding this if clause makes the acceptance test fail although it fixes the outputs being totally off. I will investigate this further tomorrow. |
Okay I just had a look, and the previous expected loss value for the bloom-560m model was wrong and not what I got as loss when I ran the prompt with the huggingface implementation. Setting prepend_bos to False by default has brought the loss of the TransformerLens implementation much closer to the loss of the hugging face implementation, but it is still quite not there. I at the moment am not sure how to fix this, but I hope this might get fixed by some other inaccuracy tweaks across the project. For the time being I will focus on that and hope that this will be resolved by that too. |
Interestingly, I just ran the hugging face implementation on Google Colab to see if I would get a different value than on my Mac, and indeed I did. The loss value I got from Colab now matches what we get from TransformerLens with prepend_bos set to false close enough to be within the error margin, so hooray I guess. I'm slightly confused by this, does anyone have an idea as to why I would get a different loss from the hugging face implementation on my Mac then on Google Colab? |
I just tested this with gpt2-small, and here the output with prepend_bos set to True is not completely unusable but still deviates from Huggingface. If we just want to match HuggingFace as closely as possible, I could imagine adding this in loading_from_pretrained.py and setting the prepend_bos model accordingly for each model we load from Huggingface, which also seems like a much prettier solution then what I did so far. I would be happy to do that and rename this PR and issue, what do you think @bryce13950 ? |
Let's talk about this a bit to get up to speed, and figure out if we can find where the divergence is. |
Description
This PR addresses Issue #774. Bloom models output insensible output when the prepend_bos flag is set to True, so for usability this flag should be set to False by default for the Bloom model family. I'm happy about ideas to implement this more cleanly, because with the current implementation the flag can practically not be set to True anymore, unless it is overwritten when specifically calling a function like generate. I think that's an unpretty solution but since I can't see a use case in which it would make sense to set the flag to True I think it's not terrible, but again very happy for suggestions.
Fixes #774
Type of change
Please delete options that are not relevant.
Screenshots
For Screenshots of the difference in the generation of the models please see Issue #774.
Checklist: