Fix text generation quality for bf16 models when sampling #1644

skavulya · 2024-12-20T00:15:04Z

What does this PR do?

Improves the quality of the text generated by bf16 models when sampling. This change also affects multi-card inference using deepspeed in run_generation.py which uses bf16 by default.

For example:
python run_generation.py --model_name_or_path gpt2 --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt" --bf16

Before fix: main a51475f
Here is my prompt for them to see this thread Quote courtesy of set-up-example.com Waaah awaaaaah. Whoe wooooohhhhhhhhhhhhhhhh. What's that? Haskell version in printable format?\n\[email protected]\n\nMy HOST ROCKS FOR MY CORE:\n\nacmme29@design's turntable.mac.ca:20187 Rational: why can't I mod ram?\n\n%~v1-pc

After fix:
Here is my prompt for your comment, if you wish to share with a new person:In order to maintain a good balance of health and safety, a number of initiatives have been put forth. These include protecting animals, preventing accidental death, and establishing and monitoring healthy cages and the care of animals in humane environments. These measures improve safety and quality of life for animal owners and the public alike.Introduction\n\nThe present study compared population densities of the Australian and New Zealand species of moss, plants, and

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

skavulya · 2024-12-20T00:18:23Z

@libinta @jiminha @regisss Please review this change. I am running the transformer tests and text generation README to check for regressions. Please let me know if there are additional tests that you would like me to run.

skavulya · 2024-12-20T16:21:48Z

Here are the results of transformer tests with v1.15-release branch.

GAUDI2_CI=1 RUN_SLOW=true python -m pytest tests/transformers/tests/models/

v1.15-release commit 1c16727 ==== 16 failed, 988 passed, 431 skipped, 96 warnings in 1167.08s (0:19:27) =====
v1.15-release commit 1c16727 + This PR ==== 16 failed, 988 passed, 431 skipped, 95 warnings in 1385.51s (0:23:05) =====

skavulya · 2024-12-21T07:16:31Z

Based on discussion with @jiminha, we will keep the upcast to float() introduced in huggingface/transformers@22e6f14#diff-26783ca033d92b4ce2e01eb691dbf5b05dd972b0cbfdc69fc726cf77a9dcb011

The accuracy of text generated for some lower precision models is affected because the distribution sampled by torch.multinomial is changed by the upcast of next_token_logits to float. The workaround is to cast back to the original data type.

Here are examples of outputs for different models:
GPT2
Main 425bac7
python run_generation.py --model_name_or_path gpt2 --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt" --bf16
Here is my prompt for them to see this thread Quote courtesy of set-up-example.com Waaah awaaaaah. Whoe wooooohhhhhhhhhhhhhhhh. What's that? Haskell version in printable format?\n\[email protected]\n\nMy HOST ROCKS FOR MY CORE:\n\nacmme29@design's turntable.mac.ca:20187 Rational: why can't I mod ram?\n\n%~v1-pc

This PR 090527c
Here is my prompt for your comment, if you wish to share with a new person:In order to maintain a good balance of health and safety, a number of initiatives have been put forth. These include protecting animals, preventing accidental death, and establishing and monitoring healthy cages and the care of animals in humane environments. These measures improve safety and quality of life for animal owners and the public alike.Introduction\n\nThe present study compared population densities of the Australian and New Zealand species of moss, plants, and

Falcon-7b
python run_generation.py --model_name_or_path tiiuae/falcon-7b --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt" --bf16
Main 425bac7
Here is my prompt for YSi! Enjoy!\n2.Your favorite line or dialogue from any film this year!\nMy favorite dialogue from a film or movie this year was from Captain America: The First Avenger. If you haven’t watched it, do so. Avenger’s THE MOMENT you need to feel some patriotism. The “I want to do anything” line is iconic.\nI thought I could do a good write-up on this, as I have a (',)

This PR 090527c
Here is my prompt for this week - I will be looking forward for some interesting responses...\nI am afraid my mind is blank when it comes to things I am looking forward to.\nAll I can think about is that I will be coming home tomorrow!\nI have been gone for three months to a foreign country and I am ready to return to my own bed, my own life, my own city, my own country.\nI hope I shall sleep really well once I am home, because I have

Llama-7b
python run_generation.py --model_name_or_path meta-llama/Llama-2-7b-hf --use_kv_cache --max_new_tokens 100 --do_sample --prompt "Here is my prompt" --bf16
Main 425bac7
Here is my prompt for today:\nTell us about a time when you were wrong.\nI don't know about you, but I'm usually wrong. So, I'm going to tell you about a time when I was wrong, and I'm going to tell you about a time when I was right.\nI was wrong when I thought I was too old to have kids. I was wrong when I thought I was too old to get married. I was wrong when I thought

This PR 090527c
Here is my prompt for today:\nTell us about a time when you were wrong.\nI don't know about you, but I'm usually wrong. So, I'm going to tell you about a time when I was wrong, and I'm going to tell you about a time when I was right.\nI was wrong when I thought I was too old to have kids. I was wrong when I thought I was too old to get married. I was wrong when I thought

skavulya · 2024-12-21T08:12:04Z

GAUDI2_CI=1 RUN_SLOW=true python -m pytest tests/transformers/tests/models/
Main 425bac7 ==== 17 failed, 987 passed, 431 skipped, 96 warnings in 1167.28s (0:19:27) =====
This PR 090527c ==== 17 failed, 987 passed, 431 skipped, 96 warnings in 1335.68s (0:22:15) =====

skavulya requested review from ssarkar2, bhargaveede and vivekgoe as code owners December 20, 2024 00:15

skavulya force-pushed the fix_bf16_sampling branch from 7b8180b to 9353dda Compare December 20, 2024 00:51

skavulya changed the title ~~Improve text generation quality for bf16 models when sampling~~ Fix text generation quality for bf16 models when sampling Dec 20, 2024

skavulya force-pushed the fix_bf16_sampling branch from 6755f0a to 5e5dbe4 Compare December 21, 2024 05:39

skavulya added 2 commits December 20, 2024 22:53

Improve text generation quality for bf16 models when sampling

0608684

Update bf16 accuracy fix in sample to keep upcast to float

090527c

skavulya force-pushed the fix_bf16_sampling branch from 5e5dbe4 to 090527c Compare December 21, 2024 06:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix text generation quality for bf16 models when sampling #1644

Fix text generation quality for bf16 models when sampling #1644

skavulya commented Dec 20, 2024

skavulya commented Dec 20, 2024

skavulya commented Dec 20, 2024

skavulya commented Dec 21, 2024

skavulya commented Dec 21, 2024

Fix text generation quality for bf16 models when sampling #1644

Are you sure you want to change the base?

Fix text generation quality for bf16 models when sampling #1644

Conversation

skavulya commented Dec 20, 2024

What does this PR do?

Before submitting

skavulya commented Dec 20, 2024

skavulya commented Dec 20, 2024

skavulya commented Dec 21, 2024

skavulya commented Dec 21, 2024