You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Good morning
first of all I wanted to congratulate with you for this awesome repository, it really is very well made and the practical results are great, on top of being easy to achieve.
I was wondering: is there a way I can use a pre-processed list of strings, being stems or lemmas, with your example pipeline?
The text was updated successfully, but these errors were encountered:
Hi, yes. You can implement your own function that constructs ObjectDocumentModel. You can see inspiration in https://github.com/miso-belica/sumy/blob/master/sumy/parsers/plaintext.py#L60-L78 Then send it to the Summarizer and that's it. stop-words and even stemmer are optional parts so if you omit them the summarizer will process the raw ObjectDocumentModel it gets.
Hi, thank you for your insight. By following your code, I saw that creating a Sentence requires a Tokenizer, which in my case would be redundant because I've got my tokens already.
Is it sufficient to change line 69 and 75 with sentences = [Sentence(s, None) for s in current_paragraph]?
Well, I would avoid changing sumy unless it is really needed. You can rather implement your own tokenizer like this:
classTokenizer:
language='en???'defto_sentences(self, paragraph):
returnparagraphdefto_words(self, sentence):
returnsentence# make sure this is a collection
It simply returns your already tokenized data. I don't know the precise data structures you use but I believe you should omit/replace some parts of sumy and it will work without the modification of the sumy code. But I may be wrong, never worked with the case you described here. Let me know :)
Good morning
first of all I wanted to congratulate with you for this awesome repository, it really is very well made and the practical results are great, on top of being easy to achieve.
I was wondering: is there a way I can use a pre-processed list of strings, being stems or lemmas, with your example pipeline?
The text was updated successfully, but these errors were encountered: