Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry on configurable exception #6991

Merged
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,2 +1,7 @@
Comparing source compatibility of opentelemetry-sdk-common-1.46.0-SNAPSHOT.jar against opentelemetry-sdk-common-1.45.0.jar
No changes.
**** MODIFIED CLASS: PUBLIC ABSTRACT io.opentelemetry.sdk.common.export.RetryPolicy (not serializable)
=== CLASS FILE FORMAT VERSION: 52.0 <- 52.0
+++* NEW METHOD: PUBLIC(+) ABSTRACT(+) java.util.function.Predicate<java.io.IOException> getRetryExceptionPredicate()
**** MODIFIED CLASS: PUBLIC ABSTRACT STATIC io.opentelemetry.sdk.common.export.RetryPolicy$RetryPolicyBuilder (not serializable)
=== CLASS FILE FORMAT VERSION: 52.0 <- 52.0
+++* NEW METHOD: PUBLIC(+) ABSTRACT(+) io.opentelemetry.sdk.common.export.RetryPolicy$RetryPolicyBuilder setRetryExceptionPredicate(java.util.function.Predicate<java.io.IOException>)
Original file line number Diff line number Diff line change
Expand Up @@ -1000,7 +1000,7 @@ void stringRepresentation() throws IOException, CertificateEncodingException {
+ ", "
+ "compressorEncoding=gzip, "
+ "headers=Headers\\{.*foo=OBFUSCATED.*\\}, "
+ "retryPolicy=RetryPolicy\\{maxAttempts=2, initialBackoff=PT0\\.05S, maxBackoff=PT3S, backoffMultiplier=1\\.3\\}"
+ "retryPolicy=RetryPolicy\\{maxAttempts=2, initialBackoff=PT0\\.05S, maxBackoff=PT3S, backoffMultiplier=1\\.3, retryExceptionPredicate=io.opentelemetry.sdk.common.export.RetryPolicy.*\\}"
+ ".*" // Maybe additional grpcChannel field, signal specific fields
+ "\\}");
} finally {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -904,7 +904,7 @@ void stringRepresentation() throws IOException, CertificateEncodingException {
+ ", "
+ "exportAsJson=false, "
+ "headers=Headers\\{.*foo=OBFUSCATED.*\\}, "
+ "retryPolicy=RetryPolicy\\{maxAttempts=2, initialBackoff=PT0\\.05S, maxBackoff=PT3S, backoffMultiplier=1\\.3\\}"
+ "retryPolicy=RetryPolicy\\{maxAttempts=2, initialBackoff=PT0\\.05S, maxBackoff=PT3S, backoffMultiplier=1\\.3, retryExceptionPredicate=io.opentelemetry.sdk.common.export.RetryPolicy.*\\}"
+ ".*" // Maybe additional signal specific fields
+ "\\}");
} finally {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,9 @@ public RetryInterceptor(RetryPolicy retryPolicy, Function<Response, Boolean> isR
this(
retryPolicy,
isRetryable,
RetryInterceptor::isRetryableException,
e ->
retryPolicy.getRetryExceptionPredicate().test(e)
|| RetryInterceptor.isRetryableException(e),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OR here is interesting. It means a user can choose to expand the definition of what is retryable but not reduce it. I wonder if there are any cases when you would not want to retry when the default would retry. 🤔

Copy link
Contributor Author

@YuriyHolinko YuriyHolinko Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a user can choose to expand the definition of what is retryable but not reduce it

it's exactly the idea

I wonder if there are any cases when you would not want to retry when the default would retry

I would say no 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say no 🤔

I think we might..

Suppose we want expose options to give full control to the user over what's retryable (as I alluded to in the end of this comment), we'd probably want to do something like:

  • Expose a single configurable predicate option of the form setRetryPredicate(Predicate<Throwable>)
  • Funnel all failed requests through this predicate, whether they resolved a response with a status code or ended with an exception
  • This means we'd need to translate requests with a non-200 status code to an equivalent exception to pass to the predicate
  • If the user doesn't define their own predicate, default to one that retriable when status is retryable (429, 502, 503, 504) or when the exception is retryable (like one of the SocketTimeoutException we've discussed).

In this case, its possible that a user doesn't want to retry on a particular response status code like 502, even when the default behavior is to retry on it.

Copy link
Contributor Author

@YuriyHolinko YuriyHolinko Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not fully understand the idea

It means a user can choose to expand the definition of what is retryable but not reduce it. I wonder if there are any cases when you would not want to retry when the default would retry.

the current implementation allows only to extend retry conditions, but we can change it to override or extend so user will have this option and it's probably what you meant by this comment below

Expose a single configurable predicate option of the form setRetryPredicate(Predicate)

so we could achieve it

Funnel all failed requests through this predicate, whether they resolved a response with a status code or ended with an exception

we probably should not mix status codes and exception in the same predicate but use two predicates separately. what do you think ? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we probably should not mix status codes and exception in the same predicate but use two predicates separately. what do you think ? 🤔

I've seen HTTP clients generate exceptions when the status code is 4xx or 5xx, and this is essentially going in that direction. All non-200 results flow flow through a Predicate<Throwable>, and the caller can setup different rules for different Throwables. Supposing we translate non-200 responses to an exception class like our HttpExportException, example usage might look like:

// Override default retry policy to retry on all all 5xx errors, and on any IOException
builder.setRetryPredicate(throwable -> {
  if (thowable instanceOf HttpExportException) {
    int httpStatusCode = ((HttpExportException) throwable).getResponse().statusCode();
    return httpStatusCode >= 500;
  }
  return throwable instanceOf IOException;
});

I suppose its not super important if exceptions and error responses are handled in a single predicate vs. two separate predicates, but I think my point here is still valid, which suggests we should give the user the ability to not retry on some IOException we normally retry by default:

If the user doesn't define their own predicate, default to one that retriable when status is retryable (429, 502, 503, 504) or when the exception is retryable (like one of the SocketTimeoutException we've discussed).

Copy link
Contributor Author

@YuriyHolinko YuriyHolinko Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit confused.
Do you want to say that currently we do not retry on default retryable status codes?

so in case of 429, 502, 503, 504 we throw exception in line 91 and line 96 is never executed

try {
response = chain.proceed(chain.request());
} catch (IOException e) {
exception = e;
}
if (response != null) {
boolean retryable = Boolean.TRUE.equals(isRetryable.apply(response));
if (logger.isLoggable(Level.FINER)) {
logger.log(
Level.FINER,
"Attempt "
+ attempt
+ " returned "
+ (retryable ? "retryable" : "non-retryable")
+ " response: "
+ responseStringRepresentation(response));
}
if (!retryable) {
return response;
}
}

TimeUnit.NANOSECONDS::sleep,
bound -> ThreadLocalRandom.current().nextLong(bound));
}
Expand Down Expand Up @@ -144,6 +146,11 @@ private static String responseStringRepresentation(Response response) {
return joiner.toString();
}

// Visible for testing
boolean shouldRetryOnException(IOException e) {
return isRetryableException.apply(e);
}

// Visible for testing
static boolean isRetryableException(IOException e) {
if (e instanceof SocketTimeoutException) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
import io.opentelemetry.sdk.common.export.RetryPolicy;
import java.io.IOException;
import java.net.ConnectException;
import java.net.HttpRetryException;
import java.net.ServerSocket;
import java.net.SocketTimeoutException;
import java.time.Duration;
Expand Down Expand Up @@ -61,7 +62,9 @@ void setUp() {
new Function<IOException, Boolean>() {
@Override
public Boolean apply(IOException exception) {
return RetryInterceptor.isRetryableException(exception);
return RetryInterceptor.isRetryableException(exception)
|| (exception instanceof HttpRetryException
&& exception.getMessage().contains("timeout retry"));
}
});
retrier =
Expand Down Expand Up @@ -214,20 +217,42 @@ private OkHttpClient connectTimeoutClient() {
void isRetryableException() {
// Should retry on connection timeouts, where error message is "Connect timed out" or "connect
// timed out"
assertThat(
RetryInterceptor.isRetryableException(new SocketTimeoutException("Connect timed out")))
assertThat(retrier.shouldRetryOnException(new SocketTimeoutException("Connect timed out")))
.isTrue();
assertThat(
RetryInterceptor.isRetryableException(new SocketTimeoutException("connect timed out")))
assertThat(retrier.shouldRetryOnException(new SocketTimeoutException("connect timed out")))
.isTrue();
// Shouldn't retry on read timeouts, where error message is "Read timed out"
assertThat(RetryInterceptor.isRetryableException(new SocketTimeoutException("Read timed out")))
assertThat(retrier.shouldRetryOnException(new SocketTimeoutException("Read timed out")))
.isFalse();
// Shouldn't retry on write timeouts, where error message is "timeout", or other IOException
assertThat(RetryInterceptor.isRetryableException(new SocketTimeoutException("timeout")))
assertThat(retrier.shouldRetryOnException(new SocketTimeoutException("timeout"))).isFalse();
assertThat(retrier.shouldRetryOnException(new SocketTimeoutException())).isTrue();
assertThat(retrier.shouldRetryOnException(new IOException("error"))).isFalse();

// Testing configured predicate
assertThat(retrier.shouldRetryOnException(new HttpRetryException("error", 400))).isFalse();
assertThat(retrier.shouldRetryOnException(new HttpRetryException("timeout retry", 400)))
.isTrue();
}
@Test
void isRetryableExceptionDefaultBehaviour() {
RetryInterceptor retryInterceptor = new RetryInterceptor(RetryPolicy.getDefault(),
OkHttpHttpSender::isRetryable);
assertThat(retryInterceptor
.shouldRetryOnException(new SocketTimeoutException("Connect timed out"))).isTrue();
assertThat(retryInterceptor.shouldRetryOnException(new IOException("Connect timed out")))
.isFalse();
assertThat(RetryInterceptor.isRetryableException(new SocketTimeoutException())).isTrue();
assertThat(RetryInterceptor.isRetryableException(new IOException("error"))).isFalse();
}
@Test
void isRetryableExceptionCustomRetryPredicate() {
RetryInterceptor retryInterceptor = new RetryInterceptor(RetryPolicy.builder()
.setRetryExceptionPredicate((IOException e) -> e.getMessage().equals("retry")).build(),
OkHttpHttpSender::isRetryable);

assertThat(retryInterceptor.shouldRetryOnException(new IOException("some message"))).isFalse();
assertThat(retryInterceptor.shouldRetryOnException(new IOException("retry"))).isTrue();
assertThat(retryInterceptor
.shouldRetryOnException(new SocketTimeoutException("Connect timed out"))).isTrue();
}

private Response sendRequest() throws IOException {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,9 @@
import static io.opentelemetry.api.internal.Utils.checkArgument;

import com.google.auto.value.AutoValue;
import java.io.IOException;
import java.time.Duration;
import java.util.function.Predicate;

/**
* Configuration for exporter exponential retry policy.
Expand Down Expand Up @@ -43,7 +45,8 @@ public static RetryPolicyBuilder builder() {
.setMaxAttempts(DEFAULT_MAX_ATTEMPTS)
.setInitialBackoff(Duration.ofSeconds(DEFAULT_INITIAL_BACKOFF_SECONDS))
.setMaxBackoff(Duration.ofSeconds(DEFAULT_MAX_BACKOFF_SECONDS))
.setBackoffMultiplier(DEFAULT_BACKOFF_MULTIPLIER);
.setBackoffMultiplier(DEFAULT_BACKOFF_MULTIPLIER)
.setRetryExceptionPredicate((e) -> false);
}

/**
Expand All @@ -66,6 +69,9 @@ public static RetryPolicyBuilder builder() {
/** Returns the backoff multiplier. */
public abstract double getBackoffMultiplier();

/** Returns the predicate if exception is retryable. */
public abstract Predicate<IOException> getRetryExceptionPredicate();
jack-berg marked this conversation as resolved.
Show resolved Hide resolved

/** Builder for {@link RetryPolicy}. */
@AutoValue.Builder
public abstract static class RetryPolicyBuilder {
Expand Down Expand Up @@ -96,6 +102,13 @@ public abstract static class RetryPolicyBuilder {
*/
public abstract RetryPolicyBuilder setBackoffMultiplier(double backoffMultiplier);

/**
* Set the predicate to determine if retry should happen based on exception. No retry by
* default.
*/
public abstract RetryPolicyBuilder setRetryExceptionPredicate(
Predicate<IOException> retryExceptionPredicate);
Comment on lines +110 to +111
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you consider a customizer instead, to allow choice of either replacing or enhancing the default predicate?

I realize it's not the friendliest API, but we've found the pattern really useful in AutoConfigurationCustomizer

Suggested change
public abstract RetryPolicyBuilder setRetryExceptionPredicate(
Predicate<IOException> retryExceptionPredicate);
public abstract RetryPolicyBuilder setRetryExceptionPredicateCustomizer(
Fuction<Predicate<IOException>, Predicate<IOException>> retryExceptionCustomizer);

Copy link
Contributor Author

@YuriyHolinko YuriyHolinko Jan 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and later use something like this

    static class MyFunction implements Function<Predicate<IOException>, Predicate<IOException>> {
        private Predicate<IOException> newPredicate = e -> {
            return false; // logic for  my retryable condition
        };

        @Override
        public Predicate<IOException> apply(Predicate<IOException> defaultPredicate) {
            /**
             * use this statement to extend
             */
            return newPredicate.or(defaultPredicate); // extend


            /**
             * or this to override
             */
            return newPredicate;
        }
    }

it depends how many users will want to extend it. I started the PR quickly with a thought that I need to extend rertry policy, but after some time I realized if a user wants to change the default policy it's ok just to override.

but pls tell me if you think it's nice to have the enhance option for it and I will cover it in this PR and next PR.
Please note that I also have a plan to change the defaults for retryable exception.

Copy link
Member

@jack-berg jack-berg Jan 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of a predicate customizer is that you can do things like:

  • return defaultPredicate.and(..)
  • return defaultPredicate.or(..)

If you're just wholesale overwriting the default predicate, you don't get anything from having a customizer:

  • return myCustomPredicate

This benefit is somewhat diminished if we make the default retry predicates accessible as part of our stable API (related to #6970), in which case, the and and or cases can be achieved without a customizer by referencing the default predicate API.

I realize it's not the friendliest API

This is the key bit to me. I'd say that API ergonomics feel really foreign to someone who hasn't written a AutoConfigurationCustomizer. That pattern doesn't show up anywhere else in our API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say that API ergonomics feel really foreign to someone who hasn't written a AutoConfigurationCustomizer. That pattern doesn't show up anywhere else in our API.

good point


abstract RetryPolicy autoBuild();

/** Build and return a {@link RetryPolicy} with the values of this builder. */
Expand Down
Loading