Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix text generation quality for bf16 models when sampling #1644

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

skavulya
Copy link
Contributor

What does this PR do?

Improves the quality of the text generated by bf16 models when sampling. This change also affects multi-card inference using deepspeed in run_generation.py which uses bf16 by default.

For example:
python run_generation.py --model_name_or_path gpt2 --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt" --bf16

Before fix: main a51475f
Here is my prompt for them to see this thread Quote courtesy of set-up-example.com Waaah awaaaaah. Whoe wooooohhhhhhhhhhhhhhhh. What's that? Haskell version in printable format?\n\[email protected]\n\nMy HOST ROCKS FOR MY CORE:\n\nacmme29@design's turntable.mac.ca:20187 Rational: why can't I mod ram?\n\n%~v1-pc

After fix:
Here is my prompt for your comment, if you wish to share with a new person:In order to maintain a good balance of health and safety, a number of initiatives have been put forth. These include protecting animals, preventing accidental death, and establishing and monitoring healthy cages and the care of animals in humane environments. These measures improve safety and quality of life for animal owners and the public alike.Introduction\n\nThe present study compared population densities of the Australian and New Zealand species of moss, plants, and

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

@skavulya
Copy link
Contributor Author

@libinta @jiminha @regisss Please review this change. I am running the transformer tests and text generation README to check for regressions. Please let me know if there are additional tests that you would like me to run.

@skavulya skavulya changed the title Improve text generation quality for bf16 models when sampling Fix text generation quality for bf16 models when sampling Dec 20, 2024
@skavulya
Copy link
Contributor Author

Here are the results of transformer tests with v1.15-release branch.

GAUDI2_CI=1 RUN_SLOW=true python -m pytest tests/transformers/tests/models/

  • v1.15-release commit 1c16727 ==== 16 failed, 988 passed, 431 skipped, 96 warnings in 1167.08s (0:19:27) =====

  • v1.15-release commit 1c16727 + This PR ==== 16 failed, 988 passed, 431 skipped, 95 warnings in 1385.51s (0:23:05) =====

@skavulya
Copy link
Contributor Author

Based on discussion with @jiminha, we will keep the upcast to float() introduced in huggingface/transformers@22e6f14#diff-26783ca033d92b4ce2e01eb691dbf5b05dd972b0cbfdc69fc726cf77a9dcb011

The accuracy of text generated for some lower precision models is affected because the distribution sampled by torch.multinomial is changed by the upcast of next_token_logits to float. The workaround is to cast back to the original data type.

Here are examples of outputs for different models:
GPT2
Main 425bac7
python run_generation.py --model_name_or_path gpt2 --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt" --bf16
Here is my prompt for them to see this thread Quote courtesy of set-up-example.com Waaah awaaaaah. Whoe wooooohhhhhhhhhhhhhhhh. What's that? Haskell version in printable format?\n\[email protected]\n\nMy HOST ROCKS FOR MY CORE:\n\nacmme29@design's turntable.mac.ca:20187 Rational: why can't I mod ram?\n\n%~v1-pc

This PR 090527c
Here is my prompt for your comment, if you wish to share with a new person:In order to maintain a good balance of health and safety, a number of initiatives have been put forth. These include protecting animals, preventing accidental death, and establishing and monitoring healthy cages and the care of animals in humane environments. These measures improve safety and quality of life for animal owners and the public alike.Introduction\n\nThe present study compared population densities of the Australian and New Zealand species of moss, plants, and

Falcon-7b
python run_generation.py --model_name_or_path tiiuae/falcon-7b --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt" --bf16
Main 425bac7
Here is my prompt for YSi! Enjoy!\n2.Your favorite line or dialogue from any film this year!\nMy favorite dialogue from a film or movie this year was from Captain America: The First Avenger. If you haven’t watched it, do so. Avenger’s THE MOMENT you need to feel some patriotism. The “I want to do anything” line is iconic.\nI thought I could do a good write-up on this, as I have a (',)

This PR 090527c
Here is my prompt for this week - I will be looking forward for some interesting responses...\nI am afraid my mind is blank when it comes to things I am looking forward to.\nAll I can think about is that I will be coming home tomorrow!\nI have been gone for three months to a foreign country and I am ready to return to my own bed, my own life, my own city, my own country.\nI hope I shall sleep really well once I am home, because I have

Llama-7b
python run_generation.py --model_name_or_path meta-llama/Llama-2-7b-hf --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt" --bf16
Main 425bac7
Here is my prompt for today:\nTell us about a time when you were wrong.\nI don't know about you, but I'm usually wrong. So, I'm going to tell you about a time when I was wrong, and I'm going to tell you about a time when I was right.\nI was wrong when I thought I was too old to have kids. I was wrong when I thought I was too old to get married. I was wrong when I thought

This PR 090527c
Here is my prompt for today:\nTell us about a time when you were wrong.\nI don't know about you, but I'm usually wrong. So, I'm going to tell you about a time when I was wrong, and I'm going to tell you about a time when I was right.\nI was wrong when I thought I was too old to have kids. I was wrong when I thought I was too old to get married. I was wrong when I thought

@skavulya
Copy link
Contributor Author

GAUDI2_CI=1 RUN_SLOW=true python -m pytest tests/transformers/tests/models/
Main 425bac7 ==== 17 failed, 987 passed, 431 skipped, 96 warnings in 1167.28s (0:19:27) =====
This PR 090527c ==== 17 failed, 987 passed, 431 skipped, 96 warnings in 1335.68s (0:22:15) =====

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant