Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce support for error specific retry policy overrides #331

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions temporal/api/common/v1/message.proto
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,24 @@ message ActivityType {
string name = 1;
}

// Backoff setting for a specific error. This message is designed to be used as a value
// of error backoff overrides in the RetryPolicy message.
message BackoffSettings {
// Interval of the first retry. If retryBackoffCoefficient is 1.0 then it is used for all retries.
google.protobuf.Duration initial_interval = 1 [(gogoproto.stdduration) = true];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to remove these gogoproto annotations; they're no longer supported

// Coefficient used to calculate the next retry interval.
// The next retry interval is previous interval multiplied by the coefficient.
// Must be 1 or larger.
double backoff_coefficient = 2;
// Maximum interval between retries. Exponential backoff leads to interval increase.
// This value is the cap of the increase. Default is 100x of the initial interval.
google.protobuf.Duration maximum_interval = 3 [(gogoproto.stdduration) = true];
// Maximum number of attempts. When exceeded the retries stop even if not expired yet.
// 1 disables retries. 0 means unlimited (up to the timeouts)
int32 maximum_attempts = 4;
}


// How retries ought to be handled, usable by both workflows and activities
message RetryPolicy {
// Interval of the first retry. If retryBackoffCoefficient is 1.0 then it is used for all retries.
Expand All @@ -109,6 +127,9 @@ message RetryPolicy {
// Non-Retryable errors types. Will stop retrying if the error type matches this list. Note that
// this is not a substring match, the error *type* (not message) must match exactly.
repeated string non_retryable_error_types = 5;
// Customize backoff settings for specific errors.
// Similar to 'Non-Retryable', the key should precisely match the error *type*.
map<string, BackoffSettings> error_backoff_overrides = 6;
Comment on lines +130 to +132
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Customize backoff settings for specific errors.
// Similar to 'Non-Retryable', the key should precisely match the error *type*.
map<string, BackoffSettings> error_backoff_overrides = 6;
// Customize backoff settings for specific errors.
// Similar to 'Non-Retryable', the key should match the error *type*
// in a manner appropriate for the SDK language. EX: `ErrorTypeName` for Go.
map<string, BackoffSettings> error_backoff_overrides = 6;

The problem here is we really aren't "precise", at least not in every language, and it's language dependent. In Go for example non-retryable errors are specified using only the type name rather than a fully qualified type name which would be much more precise. So I'm not sure the word "precisely" belongs here, at least not in all cases.

We'll have to make sure in the SDKs that we're clear in the docstring about what is actually being matched.

}

// Metadata relevant for metering purposes
Expand Down