Support for list of pre-generated stems/lemmas #155

DavMrc · 2021-03-18T17:06:25Z

Good morning
first of all I wanted to congratulate with you for this awesome repository, it really is very well made and the practical results are great, on top of being easy to achieve.
I was wondering: is there a way I can use a pre-processed list of strings, being stems or lemmas, with your example pipeline?

miso-belica · 2021-03-23T11:54:40Z

Hi, yes. You can implement your own function that constructs ObjectDocumentModel. You can see inspiration in https://github.com/miso-belica/sumy/blob/master/sumy/parsers/plaintext.py#L60-L78 Then send it to the Summarizer and that's it. stop-words and even stemmer are optional parts so if you omit them the summarizer will process the raw ObjectDocumentModel it gets.

DavMrc · 2021-03-26T15:07:39Z

Hi, thank you for your insight. By following your code, I saw that creating a Sentence requires a Tokenizer, which in my case would be redundant because I've got my tokens already.

Is it sufficient to change line 69 and 75 with sentences = [Sentence(s, None) for s in current_paragraph]?

miso-belica · 2021-03-29T14:40:39Z

Well, I would avoid changing sumy unless it is really needed. You can rather implement your own tokenizer like this:

class Tokenizer:
    language = 'en???'

    def to_sentences(self, paragraph):
        return paragraph

    def to_words(self, sentence):
        return sentence  # make sure this is a collection

It simply returns your already tokenized data. I don't know the precise data structures you use but I believe you should omit/replace some parts of sumy and it will work without the modification of the sumy code. But I may be wrong, never worked with the case you described here. Let me know :)

miso-belica self-assigned this Mar 23, 2021

miso-belica added the question label Mar 23, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for list of pre-generated stems/lemmas #155

Support for list of pre-generated stems/lemmas #155

DavMrc commented Mar 18, 2021

miso-belica commented Mar 23, 2021

DavMrc commented Mar 26, 2021

miso-belica commented Mar 29, 2021

Support for list of pre-generated stems/lemmas #155

Support for list of pre-generated stems/lemmas #155

Comments

DavMrc commented Mar 18, 2021

miso-belica commented Mar 23, 2021

DavMrc commented Mar 26, 2021

miso-belica commented Mar 29, 2021