Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct Objects For Rate Limits #805

Open
CarsonGH opened this issue Jul 23, 2024 · 0 comments
Open

Struct Objects For Rate Limits #805

CarsonGH opened this issue Jul 23, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@CarsonGH
Copy link

Is your feature request related to a problem? Please describe.
When using completions in a concurrent through go routines I often hit rate limits which cause errors and I have to implement retry logic and back off logic so that I can complete all my requests, without knowing the rate limits for models I cannot properly implement proper go routine delay timings.

Describe the solution you'd like
Creating RateLimit structs that could be generated by using Tier Functions and inputting model strings (that have already been defined). There would be one tier function for each tier for example : openai.Tier1RateLimits(openai.GPT4oMini) would return a RateLimit struct of the following:

type RateLimit struct {
//OpenAI Model
Model string
//Requests Per Minute
RPM int
//Requests Per Day
RPD int
//Tokens Per Minute
TPM int
//Token Size Limit For Batch Queue
BatchQueueLimit int
}

With a function as follows:

func TierFreeRateLimits(model string) RateLimit {
switch model {
case "gpt-3.5-turbo":
return RateLimit{Model: model, RPM: 3, RPD: 200, TPM: 40000, BatchQueueLimit: 200000}
case "text-embedding-3-large", "text-embedding-3-small", "text-embedding-ada-002":
return RateLimit{Model: model, RPM: 3000, RPD: 200, TPM: 1000000, BatchQueueLimit: 3000000}
case "whisper-1", "tts-1":
return RateLimit{Model: model, RPM: 3, RPD: 200}
case "dall-e-2":
return RateLimit{Model: model, RPM: 5} // img/min
case "dall-e-3":
return RateLimit{Model: model, RPM: 1} // img/min
default:
return RateLimit{} //Or could be a RateLimit Object that is unlimited/max values as to not cause blocking in case of new models
}
}

Describe alternatives you've considered
An alternative to this would be to implement a function in ratelimit.go that takes the model and sends a request to openai of messagesize 0 and max-token 0 and then parses it using the newRateLimitHeaders() function and returns a RateLimitHeaders ojbect So that rate limits/delays can be initialized before the start of a go routine.

Due to Different models having different parameters like completions vs whisper vs dall-e. You would have to setup a dummy query for each model type that uses minimal tokens and then have a switch or map that checks models and uses the appropriate request.

This method is somewhat better as an organization could have custom or different rates from the standard rates set by the tier limits so this would more correctly return the appropriate values in those cases.

@CarsonGH CarsonGH added the enhancement New feature or request label Jul 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant