You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importtorchimportnumpyasnpimportpolyfuzzfromtransformersimportT5Tokenizer, T5ForConditionalGeneration# Load the T5 model and tokenizermodel_name='t5-small'model=T5ForConditionalGeneration.from_pretrained(model_name)
tokenizer=model.from_pretrained(model_name)
# Define your target and candidate stringstarget_strings= ['The quick brown fox jumps over the lazy dog', 'The sky is blue']
candidate_strings= ['The fox is quick and the dog is lazy', 'The ocean is blue']
# Tokenize the strings and convert them to T5 embeddingstarget_tokens=tokenizer.batch_encode_plus(target_strings, padding=True, truncation=True, return_tensors='pt')
candidate_tokens=tokenizer.batch_encode_plus(candidate_strings, padding=True, truncation=True, return_tensors='pt')
withtorch.no_grad():
target_embeddings=model.encoder(input_ids=target_tokens['input_ids']).last_hidden_state.detach().numpy()
candidate_embeddings=model.encoder(input_ids=candidate_tokens['input_ids']).last_hidden_state.detach().numpy()
# Create a PolyFuzz object with default settingsmodel=polyfuzz.PolyFuzz()
# Fit the model with the T5 embeddingsmodel.fit(target_embeddings, candidate_embeddings)
# Get the matches between the target and candidate stringsmatches=model.get_matches()
The text was updated successfully, but these errors were encountered:
It is not possible within PolyFuzz to only supply the model with embeddings, you would have to pass the raw strings yourself and create either a custom model yourself or use something like Flair to load in the model.
I am learning the use case of polyfuzz with T5 embedding.
I am getting error when using following code:
MWE
The text was updated successfully, but these errors were encountered: