GitHub - AskNowQA/erpredictor2: Entity and Relation Predictor

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
README		README
fastextserver.py		fastextserver.py
lcquad.json		lcquad.json
preparedata.py		preparedata.py
preparedataserialise.py		preparedataserialise.py
requirements.txt		requirements.txt
train.py		train.py

Repository files navigation

Detects whether a given phrase is an entity or relation, irrespective of upper or lower case characters.

Install dependencies based on requirements.txt. Create an elasticsearch instance and load entity data as specified at https://github.com/AskNowQA/EARL.

1) Run fastextserver.py
2) Run preparedata.py
3) Run preparedataserialise.py
4) Run train.py

Use the generated er.model file.


The idea is this:

Given a chunk of text, we do two things 

1) We calculate the fasttext embeddings of the chunk  
2) We search that text in elasticsearch. 

The fasttext embedding is a 300 length vector. From the elasticsearch results we extract 4 more values 

1) elasticsearch score 
2) fuzzwuzz (https://github.com/seatgeek/fuzzywuzzy) ratio of chunk with top ES match 
3) fuzzwuzz partial ratio of chunk with top ES match 
4) fuzzwuzz token sort ratio with top ES match. 

We concatenate all 4 values with the 300 length vector, hence we have a 304 length input vector. We pass this through a 2 layer neural network which has 2 outputs: entity and relation. We achieve 96% accuracy on 80:20 split of lcquad dataset.