Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mistral[minor]: Added Retrying Mechanism in case of Request Rate Limit Error for MistralAIEmbeddings #27818

Merged
merged 10 commits into from
Dec 11, 2024
21 changes: 17 additions & 4 deletions libs/partners/mistralai/langchain_mistralai/embeddings.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from typing import Iterable, List

import httpx
from httpx import Response
from langchain_core.embeddings import Embeddings
from langchain_core.utils import (
secret_from_env,
Expand All @@ -15,6 +16,7 @@
SecretStr,
model_validator,
)
from tenacity import retry, retry_if_exception_type, stop_after_attempt, wait_fixed
from tokenizers import Tokenizer # type: ignore
from typing_extensions import Self

Expand Down Expand Up @@ -209,16 +211,27 @@ def embed_documents(self, texts: List[str]) -> List[List[float]]:
List of embeddings, one for each text.
"""
try:
batch_responses = (
self.client.post(
batch_responses = []

@retry(
retry=retry_if_exception_type(Exception),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally, retries should never be implemented on 4xx errors (except for 408 and 429). e.g., 403 should not be retried by default

  • What do we do in other parts of the code? Perhaps there's a better example that can be adopted?
  • What do other models do in the code base in terms of exposing the retry parameters so users can adjust? (e.g.,what if someone wants to have the first retry after 1 second rather than 30 seconds?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @eyurtsev it is a 429 error therefore retrying makes sense.
  • I was not able to find any related example.
  • Yes I can make the wait and stop seconds as parameters.

wait=wait_fixed(30), # Wait 30 seconds between retries
stop=stop_after_attempt(5), # Stop after 5 attempts
)
def _embed_batch(batch: List[str]) -> Response:
response = self.client.post(
url="/embeddings",
json=dict(
model=self.model,
input=batch,
),
)
for batch in self._get_batches(texts)
)
if response.status_code == 429:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason not to use raise_for_status? We're trying not to drop the original exception which might have useful information inside it

raise Exception("Requests rate limit exceeded")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code takes an exception that was good and informative and turns it into one that's a broad Exception of type Exception -- this is usually not a good pattern for exception handling. Stack trace will be partially lost, the exception type is less specific etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Their were no specific exception being raised to being with. But I can change it from a general Exception.

return response

for batch in self._get_batches(texts):
batch_responses.append(_embed_batch(batch))
return [
list(map(float, embedding_obj["embedding"]))
for response in batch_responses
Expand Down
Loading