-
-
Notifications
You must be signed in to change notification settings - Fork 554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TF-IDF uses the wrong transform_many() #1629
Comments
Maybe linked to #1576 |
We did not wanted to hide a for loop in the for document in documents:
tfidf.learn_one(document) @smastelini Do you think we should raise an error here or add a simple for loop with learn_one and transform_one. The best would be to have a dedicated and optimized batch tfidf. Don't have much time yet to work on it but at some point I could do it. :) |
Hi @e10e3 and @raphaelsty. I believe it should raise a |
Versions
River version: 0.21.1
Python version: 3.12.7
Operating system: macOS 14.7
Describe the bug
The output of
TFIDF.transform_one()
andTFIDF.transform_many()
are inconsistent.transform_one()
gives a dictionnary mapping a word to its importance, whiletransform_many()
gives a dataframe of the word counts.This is because TFIDF inherits from BagOfWords but does not reimplement the
*_many()
methods, leading Python to use the ones from BagOfWords in their absence.Code to reproduce
Output
Expected behaviour
transform_many
should give a dataframe of floats. I don't know if this is exactly what the values should be, but this is how it should look:The text was updated successfully, but these errors were encountered: