-
-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sumbasic: KeyError #176
Comments
Hi! Any news on this? Thanks a lot for your work! |
Maybe this could help word_freq_in_doc.get(w, 0) |
My understanding is that it is because I modified
It works now, but I still had no time to double-check that this is the correct solution. |
Same error from the docker version:
|
Hello there. I encountered this error too. The problems are in the two functions in sum_basic.py _get_content_word_in_sentence and _get_all_content_words_in_doc but mostly here The different steps in those functions creates two different set/list of words due by the stop words list called befor or after normalization or stemmer. Also _get_all_words function calls the stemmer too, creating confusion for the stop word filtering. So I just changed them like that: def _get_all_words_in_doc(self, sentences):
# return self._stem_words([w for s in sentences for w in s.words])
return [w for s in sentences for w in s.words]
def _get_content_words_in_sentence(self, sentence):
# firstly normalize
normalized_words = self._normalize_words(sentence.words)
# then filter out stop words
normalized_content_words = self._filter_out_stop_words(normalized_words)
# then stem
stemmed_normalized_content_words = self._stem_words(normalized_content_words)
return stemmed_normalized_content_words
def _get_all_content_words_in_doc(self, sentences):
all_words = self._get_all_words_in_doc(sentences)
normalized_words = self._normalize_words(all_words)
normalized_content_words = self._filter_out_stop_words(normalized_words)
stemmed_normalized_content_words = self._stem_words(normalized_content_words)
return stemmed_normalized_content_words |
sumbasic failed on text:
common.txt
sumy==0.10.0
The text was updated successfully, but these errors were encountered: