Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Leverage existing projection layers from ST models #3

Open
NohTow opened this issue Jun 13, 2024 · 1 comment
Open

Leverage existing projection layers from ST models #3

NohTow opened this issue Jun 13, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@NohTow
Copy link
Collaborator

NohTow commented Jun 13, 2024

Right now, when initializing from a ST checkpoints, we chop-off the eventual "Dense" module.
Although these checkpoints require training anyways, this layer can be a good initialization for the linear projection.

We can either merge the LinearLayer class into a Dense one (as they are basically the same (with the exception of the activation function, which could be set to None with a small modification to the original class), or we can copy the weights into the LinearLayer.

We should take care of the possible difference in output dimension compared to the configuration and either prevent it from being loaded or showing a warning.

@NohTow NohTow added the enhancement New feature or request label Jun 13, 2024
@NohTow
Copy link
Collaborator Author

NohTow commented Aug 22, 2024

Done in #37.

Letting this opened since we still can do better by loading the weights of linear layers for ST models pre-v3 that are not in a separate layers, such as https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant