You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
When using completions in a concurrent through go routines I often hit rate limits which cause errors and I have to implement retry logic and back off logic so that I can complete all my requests, without knowing the rate limits for models I cannot properly implement proper go routine delay timings.
Describe the solution you'd like
Creating RateLimit structs that could be generated by using Tier Functions and inputting model strings (that have already been defined). There would be one tier function for each tier for example : openai.Tier1RateLimits(openai.GPT4oMini) would return a RateLimit struct of the following:
type RateLimit struct {
//OpenAI Model
Model string
//Requests Per Minute
RPM int
//Requests Per Day
RPD int
//Tokens Per Minute
TPM int
//Token Size Limit For Batch Queue
BatchQueueLimit int
}
With a function as follows:
func TierFreeRateLimits(model string) RateLimit {
switch model {
case "gpt-3.5-turbo":
return RateLimit{Model: model, RPM: 3, RPD: 200, TPM: 40000, BatchQueueLimit: 200000}
case "text-embedding-3-large", "text-embedding-3-small", "text-embedding-ada-002":
return RateLimit{Model: model, RPM: 3000, RPD: 200, TPM: 1000000, BatchQueueLimit: 3000000}
case "whisper-1", "tts-1":
return RateLimit{Model: model, RPM: 3, RPD: 200}
case "dall-e-2":
return RateLimit{Model: model, RPM: 5} // img/min
case "dall-e-3":
return RateLimit{Model: model, RPM: 1} // img/min
default:
return RateLimit{} //Or could be a RateLimit Object that is unlimited/max values as to not cause blocking in case of new models
}
}
Describe alternatives you've considered
An alternative to this would be to implement a function in ratelimit.go that takes the model and sends a request to openai of messagesize 0 and max-token 0 and then parses it using the newRateLimitHeaders() function and returns a RateLimitHeaders ojbect So that rate limits/delays can be initialized before the start of a go routine.
Due to Different models having different parameters like completions vs whisper vs dall-e. You would have to setup a dummy query for each model type that uses minimal tokens and then have a switch or map that checks models and uses the appropriate request.
This method is somewhat better as an organization could have custom or different rates from the standard rates set by the tier limits so this would more correctly return the appropriate values in those cases.
The text was updated successfully, but these errors were encountered:
Is your feature request related to a problem? Please describe.
When using completions in a concurrent through go routines I often hit rate limits which cause errors and I have to implement retry logic and back off logic so that I can complete all my requests, without knowing the rate limits for models I cannot properly implement proper go routine delay timings.
Describe the solution you'd like
Creating RateLimit structs that could be generated by using Tier Functions and inputting model strings (that have already been defined). There would be one tier function for each tier for example : openai.Tier1RateLimits(openai.GPT4oMini) would return a RateLimit struct of the following:
type RateLimit struct {
//OpenAI Model
Model string
//Requests Per Minute
RPM int
//Requests Per Day
RPD int
//Tokens Per Minute
TPM int
//Token Size Limit For Batch Queue
BatchQueueLimit int
}
With a function as follows:
func TierFreeRateLimits(model string) RateLimit {
switch model {
case "gpt-3.5-turbo":
return RateLimit{Model: model, RPM: 3, RPD: 200, TPM: 40000, BatchQueueLimit: 200000}
case "text-embedding-3-large", "text-embedding-3-small", "text-embedding-ada-002":
return RateLimit{Model: model, RPM: 3000, RPD: 200, TPM: 1000000, BatchQueueLimit: 3000000}
case "whisper-1", "tts-1":
return RateLimit{Model: model, RPM: 3, RPD: 200}
case "dall-e-2":
return RateLimit{Model: model, RPM: 5} // img/min
case "dall-e-3":
return RateLimit{Model: model, RPM: 1} // img/min
default:
return RateLimit{} //Or could be a RateLimit Object that is unlimited/max values as to not cause blocking in case of new models
}
}
Describe alternatives you've considered
An alternative to this would be to implement a function in ratelimit.go that takes the model and sends a request to openai of messagesize 0 and max-token 0 and then parses it using the newRateLimitHeaders() function and returns a RateLimitHeaders ojbect So that rate limits/delays can be initialized before the start of a go routine.
Due to Different models having different parameters like completions vs whisper vs dall-e. You would have to setup a dummy query for each model type that uses minimal tokens and then have a switch or map that checks models and uses the appropriate request.
This method is somewhat better as an organization could have custom or different rates from the standard rates set by the tier limits so this would more correctly return the appropriate values in those cases.
The text was updated successfully, but these errors were encountered: