You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running classify on a larger sequence of texts, sometimes, the embedder returns a NoneType Object. This results in an error in line 235 in app/classifier.py:
embeddings = self.embedder.get_embedding(seqs)
embedding_shape = embeddings[0].shape
all_embeddings = np.zeros(
[len(embeddings), MAX_SEQ_LENGTH, embedding_shape[1]])
all_input_mask = np.zeros([len(embeddings), MAX_SEQ_LENGTH])
for i, matrix in enumerate(embeddings):
--> all_embeddings[i][:len(matrix)] = matrix
all_input_mask[i][:len(matrix)] = 1
It is possible to compute an embedding of the text, at which the NoneType object occurs, by feeding it into the get_embedding function separately or in a smaller list. However, the error is persistent at the specific text at which it fails.
For example: seqs consists of a list of 200 texts, for one text (e.g. position i=10) the function get_embedding returns None;
If the order of the texts is changed the function will still fail at the same text (now e.g. position i=120).
However calling get_embedding at the failing text (e.g. get_embeddings(seqs[10]])) returns the correct embedding.
I built the following workaround, but I would like to understand why this happens and find a better solution:
for i, matrix in enumerate(embeddings):
try:
all_embeddings[i][:len(matrix)] = matrix
all_input_mask[i][:len(matrix)] = 1
except:
matrix = self.embedder.get_embedding([seqs[i]])[0]
all_embeddings[i][:len(matrix)] = matrix
all_input_mask[i][:len(matrix)] = 1
The text was updated successfully, but these errors were encountered:
When running classify on a larger sequence of texts, sometimes, the embedder returns a NoneType Object. This results in an error in line 235 in app/classifier.py:
It is possible to compute an embedding of the text, at which the NoneType object occurs, by feeding it into the get_embedding function separately or in a smaller list. However, the error is persistent at the specific text at which it fails.
For example:
seqs consists of a list of 200 texts, for one text (e.g. position i=10) the function get_embedding returns None;
If the order of the texts is changed the function will still fail at the same text (now e.g. position i=120).
However calling get_embedding at the failing text (e.g. get_embeddings(seqs[10]])) returns the correct embedding.
I built the following workaround, but I would like to understand why this happens and find a better solution:
The text was updated successfully, but these errors were encountered: