Skip to content

Commit

Permalink
Merge branch 'NTK-by-part-rope' of https://github.com/Hzfinfdu/Transf…
Browse files Browse the repository at this point in the history
…ormerLens into NTK-by-part-rope
  • Loading branch information
Hzfinfdu committed Oct 26, 2024
2 parents cbb652c + d22b0ee commit 736da05
Showing 1 changed file with 7 additions and 7 deletions.
14 changes: 7 additions & 7 deletions transformer_lens/HookedTransformerConfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,16 +181,16 @@ class HookedTransformerConfig:
output_logits_soft_cap (float): An optional softcap for output logits, currently only used
in Gemma-2 (see attn_scores_soft_cap for details). Defaults to -1.0, which means not
set.
use_NTK_by_parts_rope (bool): Whether to apply the "NTK-by-parts" method when using Rotary
Positional Embedding. This method adjusts the interpolation based on frequency factors
for different parts of the hidden dimensions. See Section 3.2 in
use_NTK_by_parts_rope (bool): Whether to apply the "NTK-by-parts" method when using Rotary
Positional Embedding. This method adjusts the interpolation based on frequency factors
for different parts of the hidden dimensions. See Section 3.2 in
https://arxiv.org/pdf/2309.00071 for details. Defaults to False.
NTK_by_parts_low_freq_factor (float): The threshold applied to low-frequency hidden
NTK_by_parts_low_freq_factor (float): The threshold applied to low-frequency hidden
dimensions during interpolation when using the "NTK-by-parts" method. Defaults to 1.0.
NTK_by_parts_high_freq_factor (float): The threshold applied to high-frequency hidden
NTK_by_parts_high_freq_factor (float): The threshold applied to high-frequency hidden
dimensions during interpolation in the "NTK-by-parts" method. Defaults to 4.0.
NTK_by_parts_factor (float): The overall factor used in the "NTK-by-parts" method that
affects the rate of change between low and high-frequency interpolation strategies.
NTK_by_parts_factor (float): The overall factor used in the "NTK-by-parts" method that
affects the rate of change between low and high-frequency interpolation strategies.
Defaults to 8.0.
Expand Down

0 comments on commit 736da05

Please sign in to comment.