Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with concurrent requests on AWS Fargate #22

Open
eliran89c opened this issue Jun 20, 2024 · 4 comments
Open

Issue with concurrent requests on AWS Fargate #22

eliran89c opened this issue Jun 20, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@eliran89c
Copy link

Describe the Bug
I am encountering an issue where concurrent requests are being processed sequentially rather than simultaneously when deployed on AWS Fargate.
I suspect the problem is that boto3 runs synchronously, and its calls are blocking.

API Details

  • API Used: /chat/completions
  • Model Used: all of them

To Reproduce
Steps to reproduce the behavior:

  1. Deploy the service on AWS Fargate following the standard setup procedures.
  2. Send multiple concurrent requests (e.g., 10 concurrent requests) to the API.
  3. Observe that the requests are processed sequentially instead of concurrently.

Expected Behavior
I expected that when sending multiple concurrent requests to the API, all requests would be handled simultaneously or at least as many as the server can handle

@eliran89c eliran89c added the bug Something isn't working label Jun 20, 2024
@daixba
Copy link
Contributor

daixba commented Jun 21, 2024

Concurrency and asynchronous call is natively supportted by FastAPI, I did a quick test with 2 concurrency requests (with long response) and I can see both are streaming in parallel, I didn't test via code though.

You can probably try below:

  1. Try fewer requests (like 2 requests) first and see if the issue still exists.
  2. Try to test in local (The code can run locally)
  3. Try to increase the capacity of Fargate (By default, it has only 1 core, I would expect it may not support larger concurrent requests) and retest

@eliran89c
Copy link
Author

eliran89c commented Jun 23, 2024

Hi @daixba, I forgot to mention that I'm not streaming the response
With streaming, it works better, but it is still not perfect (I monitor the health-check endpoint, and it times out from time to time)

But without streaming, the API is waiting for each request to finish before being able to handle other requests

Concurrency and asynchronous call is natively supported by FastAPI

I agree; This is why I think the problem with boto3

@eliran89c
Copy link
Author

@daixba when I run boto3 with asyncio it's working as expected
#23

@QingyeSC
Copy link

所以这个能解决吗,我的大并发请求一遇到非流式就没办法

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants