Why not remove more stop-words in text processing??? #177

JiaWenqi · 2019-03-13T07:15:08Z

def get_all_words(self): """ Return all words tokenized, in lowercase and without punctuation """ return [w.lower() for w in word_tokenize(self.text) if w not in string.punctuation]
I found that in this function, only punctuation of the text was removed. But there are other types of words that have not been removed.
eg:
from nltk.corpus import stopwords words = stopwords.words('english')

The text was updated successfully, but these errors were encountered:

jstypka · 2019-03-13T09:47:36Z

yeah, we want to leave the stopwords in for word2vec to work better.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why not remove more stop-words in text processing??? #177

Why not remove more stop-words in text processing??? #177

JiaWenqi commented Mar 13, 2019 •

edited

Loading

jstypka commented Mar 13, 2019

Why not remove more stop-words in text processing??? #177

Why not remove more stop-words in text processing??? #177

Comments

JiaWenqi commented Mar 13, 2019 • edited Loading

jstypka commented Mar 13, 2019

JiaWenqi commented Mar 13, 2019 •

edited

Loading