Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Throttle Jobs (aka rate-limiting) #103

Open
olttwa opened this issue Feb 13, 2023 · 2 comments
Open

Throttle Jobs (aka rate-limiting) #103

olttwa opened this issue Feb 13, 2023 · 2 comments
Labels

Comments

@olttwa
Copy link
Member

olttwa commented Feb 13, 2023

Difference between Throttle and Rate-Limit

Both Throttling and Rate-Limiting are designed to limit count of processes at a given time. However, Rate-Limiting rejects the processes exceeding a limit and Throttling queues/pauses processes until current ones have completed.

Rate limiting protects a system by applying a hard limit on its access. Throttling shapes a system by smoothing spikes in traffic.

A background processor shouldn't reject exceeding tasks queued. That's best handled at load-balancer layer. For these reasons, in Goose, we'll use the term Throttling, and not Rate-Limiting.

Why the need to Throttle Jobs?

Often 3rd party APIs will enforce a rate limit. Ergo, the count of Jobs executing at a given time shouldn't exceed this limit.

Patterns of Throttling

As elaborated here, Throttling can be done in 4 ways:

  1. Concurrent: Only N jobs can execute at a given time.
  2. Token Bucket: Like Concurrent, but resource pool is limited to N and grows at a fixed rate, which might be higher/lower than Job completion rate.
  3. Leaky Bucket: Like Token Bucket, but allows bursts of jobs in a small time-interval. Resource pool can stay fixed like Concurrent or increase at a fixed rate like Token Bucket.
  4. Fixed Window: In a given time-frame, only N jobs can execute.
  5. Sliding Window: Like Fixed Window, but with a rolling window of time that moves the cursor to next executed job in a time-frame.

Nuances of Throttling for a background processor

Make note of these things when implementing this feature:

  1. Since executing Jobs will acquire a lock, have a lock_timeout to ensure crashed processes do not hold a lock forever
  2. Have a wait_timeout to ensure workers aren't waiting forever to acquire a lock. Upon timing out, User can configure to publish a metric, raise an alert or discard a Job altogether

Implementation Details

This is a complex feature to build. Some ideas after initial investigation:

  1. A persistent store will be required to store count of executing jobs. Hence, this feature can exist for a message-broker like Redis and Postgres, but not for RabbitMQ.
  2. If the message-broker has built-in support for expiry, that'll be helpful. Else, a separate thread will have to do garbage collection.
@olttwa
Copy link
Member Author

olttwa commented Feb 13, 2023

Until Goose supports Throttling, there are 2 hacks that can help achieve that:

  1. If you want to enqueue Jobs asynchronously, Throttling can be achieved using a combination of :threads worker config and count of worker instances
  2. For example, setting :threads count to 5 and running 4 worker instances, you can achieve a Throttle of max 20 Jobs executing concurrently
  3. While enqueuing, you can schedule Jobs with a fixed or staggered delay.

@olttwa
Copy link
Member Author

olttwa commented Feb 13, 2023

cc @rickerbh

@olttwa olttwa added the feature label Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: To do
Development

No branches or pull requests

1 participant