Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

produce_labeled_data.py appears to only use Italian stopwords #88

Open
phdowling opened this issue Dec 15, 2015 · 2 comments
Open

produce_labeled_data.py appears to only use Italian stopwords #88

phdowling opened this issue Dec 15, 2015 · 2 comments

Comments

@phdowling
Copy link

See produce_labeled_data, line 68:

                    for diz in val:
                        # Filter out linked stopwords
                        if diz['chunk'].lower() in stopwords.StopWords.words('italian'):
                            continue

I'm not really involved in this project, but I was just skimming through the code and it seems like the hardcoded selection of Italian stopwords might be a bug. Feel free to close this issue if that's not the case.

@phdowling phdowling changed the title produce_labeled_data appears to only use Italian stopwords produce_labeled_data.py` appears to only use Italian stopwords Dec 15, 2015
@phdowling phdowling changed the title produce_labeled_data.py` appears to only use Italian stopwords produce_labeled_data.py appears to only use Italian stopwords Dec 15, 2015
@marfox
Copy link
Member

marfox commented Dec 15, 2015

Thanks for reporting @phdowling ! You are right, the language should be parametrized. I labeled this issue as refactoring.

@kartiksibal
Copy link

@marfox I'd love to do the necessary changes, if you could just elaborate on the details. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants