-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add eTagValidation for uploadFileMultipart #1130
Conversation
…est at the end; same for the max partNumber aka partId
Refactored `S3.scala` to optimize `checksumPart` and `formatChecksum` functions, thereby improving code readability and performance. Added `S3Benchmark.scala` and `S3OpsStub.scala` for benchmarking S3 multi-part uploads.
This way it can be verified and eventually incorporated in the return from uploadFileMultipart in future versions.
@semenodm I don't really have any more items I want to address. Let me know if I need to make any additional changes or if you have questions on anything. |
Modified uploadPart and checksumPart methods in S3.scala. A chunk of bytes is now converted to an array only once for both uploading to S3 and computing MD5 checksum. This change enhances the efficiency of the code by minimizing the number of times the bytes data is processed, which can yield performance improvements especially for larger files.
@BusyByte i found the optimization for the digest, take a look at latest commits. |
@semenodm I'm fine with the changes you made. Yeah, depending on the performance impact we could change it to not digest if there's no validator defined. |
doing this |
Changed the 'digest' field in the PartProcessingOutcome case class to be optional due to the multipartETagValidation changes. Modified the getOverallChecksum and upload parts methods accordingly, to handle Option[PartDigest] safely and efficiently. This is done to prevent unnecessary digest calculations during multipart upload when checksum validation is not necessary
Changed the 'digest' field in the PartProcessingOutcome case class to be optional due to the multipartETagValidation changes. Modified the getOverallChecksum and upload parts methods accordingly, to handle Option[PartDigest] safely and efficiently. This is done to prevent unnecessary digest calculations during multipart upload when checksum validation is not necessary
@BusyByte thank you for the contribution, this was nice one. |
@semenodm just some notes for a future major release I think it'd be good to return from uploadFileMultipart
We send events and record metrics with this information and exposing it on the return from uploadFileMultipart would enable these to be captured. |
Computes and validates the computed eTag matches the eTag from AWS S3.
Made the validator an option with a default of None so we don't break existing consumers of the function and also allows to opt in rather than forced to get the new behavior.
We have tested this and verified with S3 and have used it successfully.
related to: #1108