-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue Facing While Fitting The Model With Huge Data #64
Comments
That is most likely the result of a large vocabulary. Setting |
I have created custom TF-IDF model ,Tried with increasing Below is the code i have created custom model. from polyfuzz.models import TFIDF
from sklearn.feature_extraction.text import TfidfVectorizer
class CustomTFIDF(TFIDF):
def __init__(self,
n_gram_range=(3, 3),
clean_string=True,
min_similarity=0.75,
top_n=1,
cosine_method="sparse",
model_id=None,
min_df_custom=2): # Add a custom parameter for min_df
super().__init__(n_gram_range, clean_string, min_similarity, top_n, cosine_method, model_id)
self.min_df_custom = min_df_custom # Set the custom min_df value
def _extract_tf_idf(self,
from_list,
to_list=None,
re_train=True):
if to_list:
if re_train:
# Customize the TfidfVectorizer with min_df
self.vectorizer = TfidfVectorizer(min_df=self.min_df_custom, analyzer=self._create_ngrams).fit(
to_list + from_list)
self.tf_idf_to = self.vectorizer.transform(to_list)
tf_idf_from = self.vectorizer.transform(from_list)
else:
if re_train:
# Customize the TfidfVectorizer with min_df
self.vectorizer = TfidfVectorizer(min_df=self.min_df_custom, analyzer=self._create_ngrams).fit(
from_list)
self.tf_idf_to = self.vectorizer.transform(from_list)
tf_idf_from = self.tf_idf_to
return tf_idf_from, self.tf_idf_to |
You can try setting the |
I am facing same issue ,even after i have changed higher value. The error i am getting |
Have you tried using |
I have data contains around 166793 Records, I want to fit this records for TF-IDF Model
Here i am facing the issue while fitting the model ,The server getting killed (I have tried with configuration of 20 gb ram).
Is there any solution?
The text was updated successfully, but these errors were encountered: