-
Notifications
You must be signed in to change notification settings - Fork 858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data loss if issues on TCP protocol layer or failures on network link. Retry policy is ignored #6962
Comments
Thanks for the report. I recommend for now we adjust the logic to catch this case you've found. If it becomes a recurring problem, we can consider making it programmatically configurable, but let's opt for the simple solution for now. PRs welcome! |
thanks for reply in my case it will be 3 exceptions when I want to retry. if we use SSL under the hood there will be more, because when servers (log receivers) are crashed/failed, ssl handshake exceptions might be thrown and we need to handle it as well. that's why I want to make it configurable in any case I start following procedure how to become a contributor and get a member access to repo to create my first PR |
hi @YuriyHolinko! no special access needed, we recommend to fork the repo and send PR from your fork |
Is your feature request related to a problem? Please describe.
I have noticed data loss in shipping of logs even if retry policy is used. the problem is that we do not have any strong logic around retryable exceptions and if there is some issue in network link we do don't re-send the data but just swallow the exception
the code which decides retry or not is here
opentelemetry-java/exporters/sender/okhttp/src/main/java/io/opentelemetry/exporter/sender/okhttp/internal/RetryInterceptor.java
Line 148 in 1fcbed8
but it does not handle all the issues with network link failures e.g. my use case
java.net.SocketTimeoutException: timeout
because the message does not match a condition
message == null || message.toLowerCase(Locale.ROOT).contains("connect timed out");
Describe the solution you'd like
I suggest to make this condition
Describe alternatives you've considered
I cannot define
Additional context
I can bring a draft PR with my fix as I am not using OTEL java agents to ship logs but initiate it programmatically . Hope it will be useful for everyone
The text was updated successfully, but these errors were encountered: