How can I get the words embeddings? #17

stygian2a · 2019-02-25T14:07:00Z

Hello!
Thank you for sharing this code!

Is there an easy way to get the embedding of a particular word?
Those found in table 5. of the paper.
Thank you!

glample · 2019-02-25T20:40:42Z

Hi!

Yes, I would suggest looking at the code in the notebook:
https://github.com/facebookresearch/XLM/blob/master/generate-embeddings.ipynb

Then doing something like that should work:

word_id = dico.index('cat')
model.embeddings.weight[word_id]

stygian2a · 2019-02-26T13:45:58Z

Thank you! How can I differentiate words from different languages (ie 'chat' in french means cat)?

glample · 2019-02-26T15:16:19Z

You can just replace "cat" by "chat" in the code above. There is only one share vocabulary, that contains the words for all languages. The vocabulary doesn't keep track of which word is used in which language.

stygian2a · 2019-02-26T16:08:42Z

Got it, thx for everything!

vvssttkk · 2019-08-21T13:59:53Z

i want get models for russian language, the mlm_xnli15_1024.pth will do?

glample · 2019-08-21T14:42:55Z

Yes, it contains Russian. But these two models will give you a better performance, and also support Russian:

https://dl.fbaipublicfiles.com/XLM/mlm_17_1280.pth
https://dl.fbaipublicfiles.com/XLM/mlm_100_1280.pth

vvssttkk · 2019-08-21T14:50:24Z

plus, for the bpe i should use this tokenization for mlm_17_1280.pth?

vvssttkk · 2019-08-21T16:22:20Z

so, when i run to_bpe(sentences) i get next error
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/sentences.bpe'

tmp folder i should create my hands? and where i should take sentences.bpe?

sorry, but your notebook have many errors at path to another users

vvssttkk · 2019-08-21T16:27:59Z

also, what does it mean?
/private/home/aconneau/projects/XLM/data/wiki/17/175k/ i think this path are not available

vvssttkk · 2019-08-21T16:30:10Z

plus, at folder tools i didn't give fastBPE
i should install from here. Is it's true?

vvssttkk · 2019-08-23T14:33:24Z

solved errors and create new pr describing the steps

stygian2a closed this as completed Feb 26, 2019

OfirArviv mentioned this issue Aug 15, 2019

Getting embedding from XLM in differnet languages huggingface/transformers#1034

Closed

JxuHenry mentioned this issue Oct 28, 2019

I train UNMT with multi-GPU got the following errors! #224

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I get the words embeddings? #17

How can I get the words embeddings? #17

stygian2a commented Feb 25, 2019 •

edited

Loading

glample commented Feb 25, 2019

stygian2a commented Feb 26, 2019

glample commented Feb 26, 2019

stygian2a commented Feb 26, 2019

vvssttkk commented Aug 21, 2019 •

edited

Loading

glample commented Aug 21, 2019

vvssttkk commented Aug 21, 2019 •

edited

Loading

vvssttkk commented Aug 21, 2019

vvssttkk commented Aug 21, 2019

vvssttkk commented Aug 21, 2019 •

edited

Loading

vvssttkk commented Aug 23, 2019 •

edited

Loading

How can I get the words embeddings? #17

How can I get the words embeddings? #17

Comments

stygian2a commented Feb 25, 2019 • edited Loading

glample commented Feb 25, 2019

stygian2a commented Feb 26, 2019

glample commented Feb 26, 2019

stygian2a commented Feb 26, 2019

vvssttkk commented Aug 21, 2019 • edited Loading

glample commented Aug 21, 2019

vvssttkk commented Aug 21, 2019 • edited Loading

vvssttkk commented Aug 21, 2019

vvssttkk commented Aug 21, 2019

vvssttkk commented Aug 21, 2019 • edited Loading

vvssttkk commented Aug 23, 2019 • edited Loading

stygian2a commented Feb 25, 2019 •

edited

Loading

vvssttkk commented Aug 21, 2019 •

edited

Loading

vvssttkk commented Aug 21, 2019 •

edited

Loading

vvssttkk commented Aug 21, 2019 •

edited

Loading

vvssttkk commented Aug 23, 2019 •

edited

Loading