-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SO0111-SHARR-sendNotifications lambda (via the layer) returns ResourceAlreadyExistsException when a log stream already exists #218
Comments
Hi Ken, |
After looking into it, it seems that the error in the logs you provided is occurring on line 73 of The Examining the timestamps of the logs you've provided, it seems like the first and second run of send_notifications are happening in parallel, which means a race condition is the likely cause of the exception here. In other words, both instances of the lambda are trying to create and use the log stream at the same time. Unfortunately there isn't an easy fix for this, however as a temporary workaround you can set the concurrency of the send_notifications lambda function to 1, meaning only one instance of the function can run at a time. One important thing to note - the We will prioritize a fix for this issue to include with the next enhancement release of the solution. |
Thanks! My "test" which probably caused the race condition was manually
selecting 4 "Athena.4" and taking action of "Remediate with ASR" inside
security hub.
I did see its the last (or almost unless Jira/ServiceNow) is used; but
it still made the orchestration show with the big red failed! Anyway,
thanks for the information, and as im working in a complete test PMA/MPA
i'm the only one triggering things; It sounds like in "real life" with a
bunch of eventbridges triggering the race condition should not happen.
I'm glad I found a "bad scenario" even if i didnt find the right
solution! I appreciate all the quick analysis! I do try and "figure
it out" myself so I can keep my company's fork up to date with our
customizations; before running with a "it doesn't work" post.
…On Thu, Jan 9, 2025 at 3:58 PM Ryan Garay ***@***.***> wrote:
After looking into it, it seems that the error in the logs you provided is
occurring on line 73 of applogger.py, which is during the second attempt
at calling create_log_stream. It doesn't seem like the execution ever
enters the first if-clause at all (if type(e).__name__ ==
"ResourceAlreadyExistsException":) instead entering the elif clause,
which then sparks the exception. For this reason, I don't think your code
would be executed in the error scenario you're experiencing. I.e., The
first call to create_log_stream raises a ResourceNotFoundException, then
the second call raises a ResourceAlreadyExistsException.
The flush method will indeed add the sequence token if the stream exists,
since the _create_log_stream method will return the name of the existing
log stream when a ResourceAlreadyExistsException is encountered, and the
sequence token is added in line 122 of applogger.py.
Examining the timestamps of the logs you've provided, it seems like the
first and second run of send_notifications are happening in parallel, which
means a race condition is the likely cause of the exception here. In other
words, both instances of the lambda are trying to create and use the log
stream at the same time. Unfortunately there isn't an easy fix for this,
however as a temporary workaround you can set the concurrency of the
send_notifications lambda function to 1, meaning only one instance of the
function can run at a time.
One important thing to note - the flush method is the final action of the
send_notifications function, meaning this exception will not have any
effect on the core functionality of the solution or notifications step.
We will prioritize a fix for this issue to include with the next
enhancement release of the solution.
—
Reply to this email directly, view it on GitHub
<#218 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABIQIKDBKB2LKUQ4D5NCMX32J3PGPAVCNFSM6AAAAABU4R4LGWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKOBRGIZTGNZRGI>
.
You are receiving this because you authored the thread.Message ID:
<aws-solutions/automated-security-response-on-aws/issues/218/2581233712@
github.com>
|
No problem, glad to help. Feel free to reach out if you have any further questions about the solution. I will leave this issue open until a fix is introduced in the next release. |
Describe the bug
Log streams written with the help of the _create_log_stream in applogger.py in the orchestrator lambda layer. If the log stream name already exists then the lambda fails.
To Reproduce
Have the orchestrator stepfunction run more than one time per day
Expected behavior
The Log Group SO0111-SHARR Log stream names are now custom to reflect the date. It appears that if the notify lambda is run multiple times in the same day, the log stream (SHARR-2025-01-09) will already exist. The expectation is that when a log stream already exists, the orchestrator lambdas will not raise an exception. Instead it should get the next sequence token from the log stream which will be used in subsequent put_log_events calls.
Please complete the following information about the solution:
FIRST RUN. (Logs of /aws/lambda/SO0111-SHARR-sendNotifications):
SECOND RUN (Logs of /aws/lambda/SO0111-SHARR-sendNotifications):
Screenshots
If applicable, add screenshots to help explain your problem (please DO NOT include sensitive information).
Additional context
I made my own change to get "by" this issue, but not confident enough for submitting a pull request at this time.
Updated code in layer applogger.py
When catching ResourceAlreadyExistsException, we now fetch the current sequence token for the existing stream using describe_log_streams. This sequence token is stored in self._stream_token which will be used in subsequent put_log_events calls. This modification ensures that when you encounter an existing stream, you'll get the correct sequence token right away, which should prevent the error and allow for proper log writing to existing streams. The sequence token is important because CloudWatch Logs uses it to maintain the order of log events and prevent duplicate entries. When a stream already exists, we need to know the current sequence token to append new logs correctly.
Additionally, I added the permission
logs:DescribeLogStreams
to all the policies which work with writing logs as the layer may use the common code.The text was updated successfully, but these errors were encountered: