Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for NTK-by-Part Rotary Embedding & set correct rotary base for Llama-3.1 series #763

Closed
wants to merge 4 commits into from

Conversation

Hzfinfdu
Copy link
Contributor

Description

Thanks to #691 and #761 for proposing and supporting Llama-3.1 models in TL.
However two modifications made on rotary embedding is not implemented here.

  • Llama 3.1 models use a rope_theta (which is rotary_base in TL) of 500,000 for 8B and 70B models.
  • Llama 3.1 models use NTK-by-Parts Rotary Embedding introduced in Section 3.2 in https://arxiv.org/pdf/2309.00071, where they:

    introduce two extra parameters α, β. All hidden dimensions d where r(d) < α are those where we linearly
    interpolate by a scale s (exactly like PI, avoiding any extrapolation), and the d where r(d) > β are
    those where we do not interpolate at all.

This PR implements both changes and significantly decreases max logit diff from >1 to about 0.2 for Llama-3.1-8B/

No dependency is required for this change.

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

Please attach before and after screenshots of the change if applicable.

Before After
image image

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

@Hzfinfdu Hzfinfdu closed this Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant