Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can BAAI/bge-m3 will be supported? #4

Open
sweetcard opened this issue Feb 4, 2024 · 2 comments
Open

Can BAAI/bge-m3 will be supported? #4

sweetcard opened this issue Feb 4, 2024 · 2 comments

Comments

@sweetcard
Copy link

Thank you for your excellent work.

bge-m3 is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.

When run the following command:
python convert-to-ggml.py './bge-m3' f16

Traceback (most recent call last)
FileNotFoundError: [Errno 2] No such file or directory: './bge-m3/vocab.txt'

Will make some changes to convert-to-ggml.py to support the new model?

@iamlemec
Copy link
Owner

iamlemec commented Feb 4, 2024

Yup, defintely want to support the new magic from BAAI. It looks like they use a different tokenizer (XLMRobertaTokenizer) and a slightly different model architecture (xlm-roberta). I think we can copy over some more general vocab conversion strategies from llama.cpp/convert.py and then tweak the model code a bit.

If you have any tips or ideas on this, I'm all ears. Either way, will be looking into this.

@iamlemec
Copy link
Owner

iamlemec commented Feb 5, 2024

Ok, I think it's basically working. The embeddings are still slightly different from what huggingface is giving, but they're pretty close. It seems possible that there's one or two things I'm not getting quite right.

Will keep refining in the coming days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants