You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a batch size too large for GPU is requested the container will start throwing 500 errors. I believe the container is not crashed on this instance but is bad experience for user.
Reproduction steps
Send text-to-image request with batch size of 10 to Bytedance/SDXL-Lightning model
ai-runner container will start throwing 500 errors as the retries continue if on RTX 4090 or lower VRAM gpu.
Expected behaviour
O/T should be able to specify max batch size with a default of 1. Any batch size above the set max batch size is run sequentially in the ai-runner to get the requested batch of images.
For ByteDance/SDXL-Lightning the processing of a batch of images in one request is linear in time it takes to process the same number of images sequentially (1 takes 700ms, 3 takes 2.1s when batched together). Its not exact but its really close so the user experience would be very similar with a batch or sequentially run image generation. I don't expect this to be the case for all models though. Testing and experience would drive how this works model to model but a batch size of 1 should be a safe start.
For some requests it may make sense to start returning images or making them available for download by the B as they are processed. For fast models like ByteDance/SDXL-Lightning, probably does not need to be concerned with that.
It could also be argued, if tickets are sized to the pixels requested, the batch request should be split between multiple Os to get the batch done faster.
Severity
None
Screenshots / Live demo link
No response
OS
Linux
Running on
Docker
AI-worker version
latest (alpha testnet)
Additional context
No response
The text was updated successfully, but these errors were encountered:
Describe the bug
If a batch size too large for GPU is requested the container will start throwing 500 errors. I believe the container is not crashed on this instance but is bad experience for user.
Reproduction steps
Expected behaviour
O/T should be able to specify max batch size with a default of 1. Any batch size above the set max batch size is run sequentially in the ai-runner to get the requested batch of images.
For ByteDance/SDXL-Lightning the processing of a batch of images in one request is linear in time it takes to process the same number of images sequentially (1 takes 700ms, 3 takes 2.1s when batched together). Its not exact but its really close so the user experience would be very similar with a batch or sequentially run image generation. I don't expect this to be the case for all models though. Testing and experience would drive how this works model to model but a batch size of 1 should be a safe start.
For some requests it may make sense to start returning images or making them available for download by the B as they are processed. For fast models like ByteDance/SDXL-Lightning, probably does not need to be concerned with that.
It could also be argued, if tickets are sized to the pixels requested, the batch request should be split between multiple Os to get the batch done faster.
Severity
None
Screenshots / Live demo link
No response
OS
Linux
Running on
Docker
AI-worker version
latest (alpha testnet)
Additional context
No response
The text was updated successfully, but these errors were encountered: