Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.5 #721

Merged
merged 3 commits into from
Sep 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/ISSUE_TEMPLATE/bug.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ Please try to provide a minimal example to reproduce the bug. Error messages and
Describe the characteristic of your environment:
* Describe how `transformer_lens` was installed (pip, docker, source, ...)
* What OS are you using? (Linux, MacOS, Windows)
* Python version (We suppourt 3.7 -3.10 currently)
* Python version (We support 3.7--3.10 currently)

**Additional context**
Add any other context about the problem here.

### Checklist

- [ ] I have checked that there is no similar [issue](https://github.com/TransformerLensOrg/TransformerLens/issues) in the repo (**required**)
- [ ] I have checked that there is no similar [issue](https://github.com/TransformerLensOrg/TransformerLens/issues) in the repo (**required**)
3 changes: 3 additions & 0 deletions transformer_lens/HookedTransformer.py
Original file line number Diff line number Diff line change
Expand Up @@ -1070,6 +1070,7 @@ def from_pretrained(
default_prepend_bos: bool = True,
default_padding_side: Literal["left", "right"] = "right",
dtype="float32",
first_n_layers: Optional[int] = None,
**from_pretrained_kwargs,
) -> "HookedTransformer":
"""Load in a Pretrained Model.
Expand Down Expand Up @@ -1204,6 +1205,7 @@ def from_pretrained(
the model.
default_padding_side: Which side to pad on when tokenizing. Defaults to
"right".
first_n_layers: If specified, only load the first n layers of the model.
"""

assert not (
Expand Down Expand Up @@ -1261,6 +1263,7 @@ def from_pretrained(
n_devices=n_devices,
default_prepend_bos=default_prepend_bos,
dtype=dtype,
first_n_layers=first_n_layers,
**from_pretrained_kwargs,
)

Expand Down
4 changes: 2 additions & 2 deletions transformer_lens/HookedTransformerConfig.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ class HookedTransformerConfig:
attention
attn_types (List[str], *optional*): the types of attention to use for
local attention
weight_init_mode (str): the initialization mode to use for the
init_mode (str): the initialization mode to use for the
weights. Only relevant for custom models, ignored for pre-trained.
We now support 'gpt2', 'xavier_uniform', 'xavier_normal', 'kaiming_uniform',
'kaiming_normal'. MuP support to come. Defaults to 'gpt2'.
Expand All @@ -100,7 +100,7 @@ class HookedTransformerConfig:
Used to set sources of randomness (Python, PyTorch and NumPy) and to initialize weights.
Defaults to None. We recommend setting a seed, so your experiments are reproducible.
initializer_range (float): The standard deviation of the normal used to
initialise the weights, initialized to 0.8 / sqrt(d_model). If weight_init_mode is
initialise the weights, initialized to 0.8 / sqrt(d_model). If init_mode is
'xavier_uniform' or 'xavier_normal', this value is instead treated as the `gain` parameter for the weight
initialisation (a constant factor to scale the weights by). Defaults to -1.0, which means not set.
init_weights (bool): Whether to initialize the weights. Defaults to
Expand Down
3 changes: 3 additions & 0 deletions transformer_lens/loading_from_pretrained.py
Original file line number Diff line number Diff line change
Expand Up @@ -1389,6 +1389,7 @@ def get_pretrained_model_config(
n_devices: int = 1,
default_prepend_bos: bool = True,
dtype: torch.dtype = torch.float32,
first_n_layers: Optional[int] = None,
**kwargs,
):
"""Returns the pretrained model config as an HookedTransformerConfig object.
Expand Down Expand Up @@ -1501,6 +1502,8 @@ def get_pretrained_model_config(
cfg_dict["default_prepend_bos"] = default_prepend_bos
if hf_cfg is not None:
cfg_dict["load_in_4bit"] = hf_cfg.get("quantization_config", {}).get("load_in_4bit", False)
if first_n_layers is not None:
cfg_dict["n_layers"] = first_n_layers

cfg = HookedTransformerConfig.from_dict(cfg_dict)
return cfg
Expand Down
Loading