Applying Mat2Vec to COVID papers dataset
I used Mat2Vec (https://github.com/materialsintelligence/mat2vec) and instead of inputting journal abstracts about materials, I used journal abstracts about COVID-19. One by one, I searched for all the terms on the tasks page for the Kaggle COVID competition. What the program returned was a list of words in the COVID documents closely associated with the one I searched for.
What came up was interesting to me. I removed any words in the search results that seemed generic.
Two particularly interesting insights:
- The word "green"shows up very often. I'm not sure what this is in relation to.
- The word "bone" shows up with "risks" in a couple places.
I haven't heard anything relating to COVID and bones - "Evolved" is associated with "designed" and "led.
Other interesting word associations:
transmission: green
incubation: mice, road
stable: genetically, green, end, lack, chain
environment: core, green
environmental: green, physicians, reducing, administration
risk: protected
risks: bone, green
origin: natural, numerous, genotype, hosts, human, physiological
genetic: risks, especially, virulence
evolution: green, patterns, rapidly, virulence
evolved: designed, led
vaccine: generated, directly, structure, green
therapeutic: immune, particles
therapeutics: interactions, genotype, core, selection, added
test: cough
tests: plasma, regression
testing: green, users, dicrectly, standard, genotype
ethical: green
medical: green, four
diagnostics: bone, virulence, reagents, regression, expected
surveillance: control, national, evaluated
social: long
sharing: green, four, risks, genetically, expected
share: mayroon
information: core