Throttle Jobs (aka rate-limiting) #103

olttwa · 2023-02-13T15:30:11Z

Difference between Throttle and Rate-Limit

Both Throttling and Rate-Limiting are designed to limit count of processes at a given time. However, Rate-Limiting rejects the processes exceeding a limit and Throttling queues/pauses processes until current ones have completed.

Rate limiting protects a system by applying a hard limit on its access. Throttling shapes a system by smoothing spikes in traffic.

A background processor shouldn't reject exceeding tasks queued. That's best handled at load-balancer layer. For these reasons, in Goose, we'll use the term Throttling, and ~~not Rate-Limiting~~.

Why the need to Throttle Jobs?

Often 3rd party APIs will enforce a rate limit. Ergo, the count of Jobs executing at a given time shouldn't exceed this limit.

Patterns of Throttling

As elaborated here, Throttling can be done in 4 ways:

Concurrent: Only N jobs can execute at a given time.
Token Bucket: Like Concurrent, but resource pool is limited to N and grows at a fixed rate, which might be higher/lower than Job completion rate.
Leaky Bucket: Like Token Bucket, but allows bursts of jobs in a small time-interval. Resource pool can stay fixed like Concurrent or increase at a fixed rate like Token Bucket.
Fixed Window: In a given time-frame, only N jobs can execute.
Sliding Window: Like Fixed Window, but with a rolling window of time that moves the cursor to next executed job in a time-frame.

Nuances of Throttling for a background processor

Make note of these things when implementing this feature:

Since executing Jobs will acquire a lock, have a lock_timeout to ensure crashed processes do not hold a lock forever
Have a wait_timeout to ensure workers aren't waiting forever to acquire a lock. Upon timing out, User can configure to publish a metric, raise an alert or discard a Job altogether

Implementation Details

This is a complex feature to build. Some ideas after initial investigation:

A persistent store will be required to store count of executing jobs. Hence, this feature can exist for a message-broker like Redis and Postgres, but not for RabbitMQ.
If the message-broker has built-in support for expiry, that'll be helpful. Else, a separate thread will have to do garbage collection.

The text was updated successfully, but these errors were encountered:

olttwa · 2023-02-13T15:36:17Z

Until Goose supports Throttling, there are 2 hacks that can help achieve that:

If you want to enqueue Jobs asynchronously, Throttling can be achieved using a combination of :threads worker config and count of worker instances
For example, setting :threads count to 5 and running 4 worker instances, you can achieve a Throttle of max 20 Jobs executing concurrently
While enqueuing, you can schedule Jobs with a fixed or staggered delay.

olttwa · 2023-02-13T15:36:29Z

cc @rickerbh

olttwa added the feature label Feb 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Throttle Jobs (aka rate-limiting) #103

Throttle Jobs (aka rate-limiting) #103

olttwa commented Feb 13, 2023

olttwa commented Feb 13, 2023

olttwa commented Feb 13, 2023

Throttle Jobs (aka rate-limiting) #103

Throttle Jobs (aka rate-limiting) #103

Comments

olttwa commented Feb 13, 2023

Difference between Throttle and Rate-Limit

Why the need to Throttle Jobs?

Patterns of Throttling

Nuances of Throttling for a background processor

Implementation Details

olttwa commented Feb 13, 2023

olttwa commented Feb 13, 2023