Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set prepend_bos to false by default for Bloom model family #775

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

degenfabian
Copy link

@degenfabian degenfabian commented Nov 8, 2024

Description

This PR addresses Issue #774. Bloom models output insensible output when the prepend_bos flag is set to True, so for usability this flag should be set to False by default for the Bloom model family. I'm happy about ideas to implement this more cleanly, because with the current implementation the flag can practically not be set to True anymore, unless it is overwritten when specifically calling a function like generate. I think that's an unpretty solution but since I can't see a use case in which it would make sense to set the flag to True I think it's not terrible, but again very happy for suggestions.

Fixes #774

Type of change

Please delete options that are not relevant.

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Screenshots

For Screenshots of the difference in the generation of the models please see Issue #774.

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@degenfabian degenfabian marked this pull request as ready for review November 8, 2024 03:34
@degenfabian
Copy link
Author

Interestingly, adding this if clause makes the acceptance test fail although it fixes the outputs being totally off. I will investigate this further tomorrow.

@degenfabian degenfabian marked this pull request as draft November 8, 2024 04:06
@degenfabian
Copy link
Author

Okay I just had a look, and the previous expected loss value for the bloom-560m model was wrong and not what I got as loss when I ran the prompt with the huggingface implementation. Setting prepend_bos to False by default has brought the loss of the TransformerLens implementation much closer to the loss of the hugging face implementation, but it is still quite not there. I at the moment am not sure how to fix this, but I hope this might get fixed by some other inaccuracy tweaks across the project. For the time being I will focus on that and hope that this will be resolved by that too.

@degenfabian
Copy link
Author

Interestingly, I just ran the hugging face implementation on Google Colab to see if I would get a different value than on my Mac, and indeed I did. The loss value I got from Colab now matches what we get from TransformerLens with prepend_bos set to false close enough to be within the error margin, so hooray I guess. I'm slightly confused by this, does anyone have an idea as to why I would get a different loss from the hugging face implementation on my Mac then on Google Colab?

@degenfabian degenfabian marked this pull request as ready for review November 8, 2024 15:45
@degenfabian
Copy link
Author

degenfabian commented Nov 8, 2024

I just tested this with gpt2-small, and here the output with prepend_bos set to True is not completely unusable but still deviates from Huggingface. If we just want to match HuggingFace as closely as possible, I could imagine adding this in loading_from_pretrained.py and setting the prepend_bos model accordingly for each model we load from Huggingface, which also seems like a much prettier solution then what I did so far. I would be happy to do that and rename this PR and issue, what do you think @bryce13950 ?

@bryce13950
Copy link
Collaborator

Let's talk about this a bit to get up to speed, and figure out if we can find where the divergence is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Proposal] prepend_bos should by default be set to false for the Bloom model family
2 participants