-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add SimpleS3Client::copy #1592
base: master
Are you sure you want to change the base?
Add SimpleS3Client::copy #1592
Conversation
$error = null; | ||
foreach ($responses as $idx => $response) { | ||
try { | ||
/** @var CopyPartResult $copyPartResult */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is totally useless. The getter already define its return type properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the static analysis or the IDE does not properly detect the type of $response
based on assignments in the array, the acceptable thing would be added @var array<int, UploadPartCopyOutput> $responses
on the initialization of the empty array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is the same situation,
exception is handled in the same place
return; | ||
} | ||
|
||
/** @var string $uploadId */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is useless. getUploadId
already define its proper return type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello.
How can I handle this errors
https://github.com/async-aws/aws/actions/runs/6642786472/job/18048337300?pr=1592
without doc blocs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if the AWS API can indeed return string|null
for the upload id, you must actually handle the case of null
properly instead of pretending that the type inference is wrong and forcing the type checkers to consider the variable as having the type string
.
If the AWS API actually returns null
, the code would still be broken at runtime and the type override added there only prevents the type checker for detecting the broken code. It does not make it less broken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand from AWS DOC (https://docs.aws.amazon.com/AmazonS3/latest/API/API_CreateMultipartUpload.html)
If the action is successful, the service sends back an HTTP 200 response.
Upload Id will be always string, otherwise exception would be thrown
underlying client handle case with uploadId == null and throws an exception.
I don't see why this code should be broken, exception will be thown any way
the solution was taken from here
https://github.com/async-aws/aws/blob/master/src/Integration/Aws/SimpleS3/src/SimpleS3Client.php#L104
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stof any updates on this?
foreach ($responses as $idx => $response) { | ||
try { | ||
$copyPartResult = $response->getCopyPartResult(); | ||
$parts[] = new CompletedPart(['ETag' => $copyPartResult->getEtag(), 'PartNumber' => $idx]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The weird thing about the PossiblyNullReference error reported by Psalm here is that https://docs.aws.amazon.com/AmazonS3/latest/API/API_UploadPartCopy.html#API_UploadPartCopy_ResponseElements documents CopyPartResult as required in the output. So it is weird that the operation was generated with a nullable type. Maybe the AWS SDK has metadata that does not match the documentation there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems there are no "required" fields for outputs
https://raw.githubusercontent.com/aws/aws-sdk-php/3.283.11/src/data/s3/2006-03-01/api-2.json
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest reporting this to AWS as it looks like a mismatch between the human-readable documentation and the machine-readable data used to generate the SDKs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what should I do?
Add additional checking? Or will wait for AWS fix?
We are waiting for this functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@stof any updates on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @stof
I wanted to bring to your attention an ongoing issue regarding AWS that was posted approximately 4 months ago. It seems that progress on resolving this matter has been slower than anticipated.
Considering the delay in fixing the manifests, I'd like to propose a potential solution. Would it be feasible to remove the annotations related to this issue from the code and handle the error suppression in the Psalm baseline instead? I've come across a few similar issues in the baseline that could potentially benefit from this approach.
I believe this adjustment could provide a temporary workaround while we await a resolution from the AWS side. However, I'm open to discussing alternative solutions or any concerns you may have regarding this proposal.
Nice feature to have 👍 |
ping |
* CacheControl?: string, | ||
* Metadata?: array<string, string>, | ||
* PartSize?: positive-int, | ||
* Concurrency?: positive-int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use int
, this syntax is not a standard, therefore not understood by all IDE.
$sourceHead = $this->headObject(['Bucket' => $srcBucket, 'Key' => $srcKey]); | ||
$contentLength = (int) $sourceHead->getContentLength(); | ||
$options['ContentType'] = $sourceHead->getContentType(); | ||
$concurrency = (int) ($options['Concurrency'] ?? 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to cast, as it's an int by contract. Let people deal with PHP runtime errors if they misuse the method.
@@ -32,6 +32,9 @@ $resource = \fopen('/path/to/cat/image.jpg', 'r'); | |||
$s3->upload('my-image-bucket', 'photos/cat_2.jpg', $resource); | |||
$s3->upload('my-image-bucket', 'photos/cat_2.txt', 'I like this cat'); | |||
|
|||
// Copy objects between buckets | |||
$s3->copy('source-bucket', 'source-key', 'destination-bucket', 'destination-key'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Options parameters deserve a small documentation IMHO
* Metadata?: array<string, string>, | ||
* PartSize?: positive-int, | ||
* Concurrency?: positive-int, | ||
* mupThreshold?: positive-int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency
* mupThreshold?: positive-int, | |
* MupThreshold?: positive-int, |
unset($options['PartSize']); | ||
|
||
// If file is less than multipart upload threshold, use normal atomic copy | ||
if ($contentLength < $mupThreshold) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need a parameter for this?
Is it an issue to split the file even if it is smaller than the default 2GB?
* files smaller than 64 * 10 000 = 640GB. If you are coping larger files, | ||
* please set PartSize to a higher number, like 128, 256 or 512. (Max 4096). | ||
*/ | ||
$partSize = ($options['PartSize'] ?? 64) * $megabyte; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If PartSize is not defined, I'd rather default to max(64 * $megabyte , 2 ** ceil(log($contentLength / 10000, 2)))
;
} catch (\Throwable $e) { | ||
$error = $e; | ||
|
||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could move the try/catch outside the loop for more readability, and removing the error variable.
try {
foreach ($responses as $idx => $response) {
$parts[] = new CompletedPart(['ETag' => $response->getCopyPartResult()->getEtag(), 'PartNumber' => $idx]);
} catch (\Throwable $e) {
foreach ($responses as $response) {
try {
$response->cancel();
} catch (\Throwable $e) {
continue;
}
}
$this->abortMultipartUpload(AbortMultipartUploadRequest::create(['Bucket' => $destBucket, 'Key' => $destKey, 'UploadId' => $uploadId]));
}
--$parallelChunks; | ||
} | ||
$error = null; | ||
foreach ($responses as $idx => $response) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not efficient. We define 10 concurents upload, but in fact, we are uploading 10, waiting for the 10 to finish, and uploading 10 more, ...
That means:
- we are waiting for the 10th to finish before starting the 9 previous.
- if 1 upload takes time to finish, we don't leverage the 9 others.
A Beter implementation would be to use a pool (or buffer), to fill the pool with 10 responses, and once 1 response is over, add a new one before checking the other responses.
Better: leverage async processing, by not waiting for the end of a response, but checking if the response is over, if not check the next one.
while not all part uploaded
while pool not full
=> start new upload
while pool full
foreach response in pull
if response not finished
continue
else
=> process response
=> remove response from pool
foreach response in pull
=> wait for response to finish
=> process response
=> remove response from pool
SimpleS3Client::copy method added
Method resolve which api it should use by itself (copyObject as atomic or copy by parts)