Can BAAI/bge-m3 will be supported? #4

sweetcard · 2024-02-04T02:16:13Z

Thank you for your excellent work.

bge-m3 is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.

When run the following command:
python convert-to-ggml.py './bge-m3' f16

Traceback (most recent call last)
FileNotFoundError: [Errno 2] No such file or directory: './bge-m3/vocab.txt'

Will make some changes to convert-to-ggml.py to support the new model?

The text was updated successfully, but these errors were encountered:

iamlemec · 2024-02-04T20:29:37Z

Yup, defintely want to support the new magic from BAAI. It looks like they use a different tokenizer (XLMRobertaTokenizer) and a slightly different model architecture (xlm-roberta). I think we can copy over some more general vocab conversion strategies from llama.cpp/convert.py and then tweak the model code a bit.

If you have any tips or ideas on this, I'm all ears. Either way, will be looking into this.

iamlemec · 2024-02-05T07:43:02Z

Ok, I think it's basically working. The embeddings are still slightly different from what huggingface is giving, but they're pretty close. It seems possible that there's one or two things I'm not getting quite right.

Will keep refining in the coming days.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can BAAI/bge-m3 will be supported? #4

Can BAAI/bge-m3 will be supported? #4

sweetcard commented Feb 4, 2024

iamlemec commented Feb 4, 2024

iamlemec commented Feb 5, 2024

Can BAAI/bge-m3 will be supported? #4

Can BAAI/bge-m3 will be supported? #4

Comments

sweetcard commented Feb 4, 2024

iamlemec commented Feb 4, 2024

iamlemec commented Feb 5, 2024