You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Yup, defintely want to support the new magic from BAAI. It looks like they use a different tokenizer (XLMRobertaTokenizer) and a slightly different model architecture (xlm-roberta). I think we can copy over some more general vocab conversion strategies from llama.cpp/convert.py and then tweak the model code a bit.
If you have any tips or ideas on this, I'm all ears. Either way, will be looking into this.
Ok, I think it's basically working. The embeddings are still slightly different from what huggingface is giving, but they're pretty close. It seems possible that there's one or two things I'm not getting quite right.
Thank you for your excellent work.
bge-m3 is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.
When run the following command:
python convert-to-ggml.py './bge-m3' f16
Traceback (most recent call last)
FileNotFoundError: [Errno 2] No such file or directory: './bge-m3/vocab.txt'
Will make some changes to convert-to-ggml.py to support the new model?
The text was updated successfully, but these errors were encountered: