-
Notifications
You must be signed in to change notification settings - Fork 186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support language which need tokenizer (Chinese, Japanese .etc) #123
Comments
Hello. The preprocessing pipeline can be customized to introduce a different tokenizer. See for instance: |
Hello @francolq ,
Then I look into the code ,
As I see, there is not just as simple as adding a tokenizer since some runners are relative.It is a little hard to customise without knowing the input and output of each runner and step format and the runner api design principle (Currently I have to view the code and tried to understand what it does, but due to knowledge and language limitation, I may stuck at some place). I would like to help to make iepy compatible with CJK language if anyone could provide the api principle to write the runners. @machinalis @jmansilla |
Sorry the delay respect this talk. Can I still help here @eromoe ? |
@eromoe Right now, I want iepy to customize to Chinese, could you give me a hand ? |
@YanWenqiang Sorry, I was just need the annotator and object binding of iepy, since it was not easy to integrate Chinese , I have already made my own now. |
@eromoe All right. Thanks a lot. Now I was also met with this trouble, I really need someone could help me. |
@eromoe I am doing Chinese EMR information extraction , can i use iepy to do entity relationship extraction ? |
I think
iepy
need a common interface to embed a tokenizer to support language like Chinese, Japanese .etc.There is a old ie project with gui named
GATE
, it contain a pre-trained model and dataset, maybe helpfulhttps://gate.ac.uk/sale/tao/splitch15.html#sec:misc-creole:language-plugins:chinese
The text was updated successfully, but these errors were encountered: