Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch size too large will crash pipeline with 500 errors #49

Open
ad-astra-video opened this issue Apr 9, 2024 · 1 comment
Open

Batch size too large will crash pipeline with 500 errors #49

ad-astra-video opened this issue Apr 9, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@ad-astra-video
Copy link
Collaborator

Describe the bug

If a batch size too large for GPU is requested the container will start throwing 500 errors. I believe the container is not crashed on this instance but is bad experience for user.

Reproduction steps

  1. Send text-to-image request with batch size of 10 to Bytedance/SDXL-Lightning model
  2. ai-runner container will start throwing 500 errors as the retries continue if on RTX 4090 or lower VRAM gpu.

Expected behaviour

O/T should be able to specify max batch size with a default of 1. Any batch size above the set max batch size is run sequentially in the ai-runner to get the requested batch of images.

For ByteDance/SDXL-Lightning the processing of a batch of images in one request is linear in time it takes to process the same number of images sequentially (1 takes 700ms, 3 takes 2.1s when batched together). Its not exact but its really close so the user experience would be very similar with a batch or sequentially run image generation. I don't expect this to be the case for all models though. Testing and experience would drive how this works model to model but a batch size of 1 should be a safe start.

For some requests it may make sense to start returning images or making them available for download by the B as they are processed. For fast models like ByteDance/SDXL-Lightning, probably does not need to be concerned with that.

It could also be argued, if tickets are sized to the pixels requested, the batch request should be split between multiple Os to get the batch done faster.

Severity

None

Screenshots / Live demo link

No response

OS

Linux

Running on

Docker

AI-worker version

latest (alpha testnet)

Additional context

No response

@rickstaa
Copy link
Member

rickstaa commented May 8, 2024

Tracked internally at https://linear.app/livepeer-ai-spe/issue/LIV-172.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants