You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Seems like a problem in detecting the language-code correctly. I didn't go to deep into checking what exactly happens but it's either that the input text got modified in a bad way or a bug in pycld2.detect(text).
A workaround that works is to provide the language-code when the Rake() object is initialised:
I tested a bit around and it's not that simple to reproduce because I'm using text extracted from PDF files. If I just copy and past the text here or to a new text file the error seems to disappear. However, I set up a small python gist with an example code that triggers the bug: https://gist.github.com/7homasSutter/45c4fe43283c67feb1caff3175876baa
By default the gist scripts assumes the pdf file'./test/ChopChop.pdf'. And here is the example PDF: PDF-Download
I am trying to extract keywords from amazon_reviews dataset, when using it for spanish i encounter this error that am unable to resolve.
Is there a workaround by manually entering Language code or something ?
The text was updated successfully, but these errors were encountered: