DD-2271 separate clients for upload and download#244
Open
janvanmansum wants to merge 4 commits into
Open
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors S3AccessIO to use separate AWS S3 async clients for download (read) vs upload (write), enabling multipart configuration only for upload operations while leaving read operations on a non-multipart client.
Changes:
- Split the single
S3AsyncClientintos3ReadClientands3WriteClient, and routed S3 operations to the appropriate client. - Introduced separate static client caches for upload vs download clients and updated TransferManager to use the upload client.
- Minor cleanups (e.g.,
isEmpty()usage, lessprintStackTraceusage).
Comments suppressed due to low confidence (2)
src/main/java/edu/harvard/iq/dataverse/dataaccess/S3AccessIO.java:1054
- S3Presigner is obtained from a static cache (getPresigner/driverPresignerMap) but is closed in this method’s finally block later. Closing a cached/shared presigner can cause future upload URL generation to fail. Prefer either per-call presigner construction (and close) or cached presigner reuse without closing (or evict/recreate after closing).
if (s3ReadClient == null || s3WriteClient == null) {
throw new IOException("ERROR: s3 not initialised. ");
}
src/main/java/edu/harvard/iq/dataverse/dataaccess/S3AccessIO.java:1117
- This method closes s3Presigner at the end of multipart URL generation, but s3Presigner is sourced from a static cache (driverPresignerMap). Closing it here can break later presign calls for other requests/threads. Consider using a non-cached presigner for these operations, or keep it cached and remove all close() calls on the shared instance.
// Use the existing s3 async client for the createMultipartUpload operation
CompletableFuture<CreateMultipartUploadResponse> createMultipartUploadFuture = s3WriteClient.createMultipartUpload(createMultipartUploadRequestBuilder.build());
CreateMultipartUploadResponse createMultipartUploadResponse = createMultipartUploadFuture.join();
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes DD-2271
Description of changes
S3AsyncClientinS3AccessIOin two separated clients: one for read and one for write actions, with multipart enabled on the write-client and disabled on the read-client. In many calls it does not really whether multipart is enabled, but for readability I stuck to the logic that read actions are performed by the read client and write actions by the write client.Notify
@DANS-KNAW/core-systems