From 324a910861b5925a52dcf222d384d382092bf160 Mon Sep 17 00:00:00 2001 From: Adina Wagner Date: Thu, 4 Jul 2024 14:14:50 +0200 Subject: [PATCH 1/6] update submodule to include new aws screenshot --- docs/artwork | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/artwork b/docs/artwork index 0cc571caa..e56f35274 160000 --- a/docs/artwork +++ b/docs/artwork @@ -1 +1 @@ -Subproject commit 0cc571caa600ea21ca6c71145c4ebbb38203871b +Subproject commit e56f352741dadb585d147e28304cf530086e018b From 164eba2e998e2dc2f8407fd701e70b906735bdaf Mon Sep 17 00:00:00 2001 From: Adina Wagner Date: Thu, 4 Jul 2024 14:16:48 +0200 Subject: [PATCH 2/6] Update S3 walkthrough with steps from #1224 As amazon s3 depcrecated ACL functionality the usecase and a git annex initremote parameter relied on, it broke. Thanks to @NickleDave for the heads-up and the fixes, which should make it work again given a recent enough git-annex --- docs/basics/101-139-s3.rst | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/docs/basics/101-139-s3.rst b/docs/basics/101-139-s3.rst index 71ba2fdd1..91bacc01f 100644 --- a/docs/basics/101-139-s3.rst +++ b/docs/basics/101-139-s3.rst @@ -156,6 +156,7 @@ Initialize the S3 special remote ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The steps below have been adapted from instructions provided on `git-annex documentation `_. +For more info on the S3 special remote, see `the s3 special remote manpage `. By initializing the special remote, what actually happens in the background is that a :term:`sibling` is added to the DataLad dataset. This can be verified @@ -178,7 +179,7 @@ it will be used again later. $ BUCKET=sample-neurodata-public $ git annex initremote public-s3 type=S3 encryption=none \ - bucket=$BUCKET public=yes datacenter=EU autoenable=true + bucket=$BUCKET datacenter=EU autoenable=true initremote public-s3 (checking bucket...) (creating bucket in EU...) ok (recording state in git...) @@ -188,7 +189,6 @@ The options used in this example include: - ``type=S3``: the type of special remote (git-annex can work with many `special remote types `_) - ``encryption=none``: no encryption (alternatively enable ``encryption=shared``, meaning files will be encrypted on S3, and anyone with a clone of the git repository will be able to download and decrypt them) - ``bucket=$BUCKET``: the name of the bucket to be created on S3 (using the declared variable) -- ``public=yes``: Set to "yes" to allow public read access to files sent to the S3 remote - ``datacenter=EU``: specify where the data will be located; here we set "EU" which is EU/Ireland a.k.a. ``eu-west-1`` (defaults to "US" if not specified) - ``autoenable=true``: git-annex will attempt to enable the special remote when it is run in a new clone, implying that users won't have to run extra steps when installing the dataset with DataLad @@ -209,6 +209,22 @@ to "Buckets" to see your newly created bucket. It should only have a single A newly created public S3 bucket +By default, this bucket and its contents are not publicly accessible. +To make them public, switch to the "Permissions" tab in your buckets S3 console overview, and turn the option "Block all public access" off. + +.. figure:: ../artwork/src/aws_s3_bucket_permissions.png + + Bucket settings allow making the bucket public + +.. find-out-more:: Info on public buckets created prior to April 2023 + + Amazon S3 buckets created after April 2023 had support for using ACLs for public read access to files. + This functionality has since been deprecated, and only remains for legacy buckets. + When dealing with an old S3 bucket using ACLs like that, it is possible to use the deprecated ``public`` parameter and set it to "yes". + + - ``public=yes``: Set to "yes" to allow public read access to files sent to the S3 remote + + Lastly, for git-annex to be able to download files from the bucket without requiring your AWS credentials, it needs to know where to find the bucket. We do this by setting the bucket URL, which takes a standard format incorporating the bucket name and location (see the code block below). @@ -235,7 +251,7 @@ option. For consistency, we'll give the GitHub repository the same name as the d .. code-block:: console $ datalad create-sibling-github -d . neuro-data-s3 \ - --publish-depends public-s3 + --publish-depends public-s3 --access-protocol ssh [INFO ] Configure additional publication dependency on "public-s3" .: github(-) [https://github.com/jsheunis/sample-neuro-data.git (git)] 'https://github.com/jsheunis/sample-neuro-data.git' configured as sibling 'github' for Dataset(/Users/jsheunis/Documents/neuro-data-s3) From b4a1fadac02aa3c65f5e6a31844d4981d422abcc Mon Sep 17 00:00:00 2001 From: Adina Wagner Date: Thu, 4 Jul 2024 14:48:47 +0200 Subject: [PATCH 3/6] improve wording MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Christian Mönch --- docs/basics/101-139-s3.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/basics/101-139-s3.rst b/docs/basics/101-139-s3.rst index 91bacc01f..e9dc59ce4 100644 --- a/docs/basics/101-139-s3.rst +++ b/docs/basics/101-139-s3.rst @@ -218,7 +218,7 @@ To make them public, switch to the "Permissions" tab in your buckets S3 console .. find-out-more:: Info on public buckets created prior to April 2023 - Amazon S3 buckets created after April 2023 had support for using ACLs for public read access to files. + Amazon S3 buckets created before April 2023 supported using ACLs for public read access to files. This functionality has since been deprecated, and only remains for legacy buckets. When dealing with an old S3 bucket using ACLs like that, it is possible to use the deprecated ``public`` parameter and set it to "yes". From be2932252505ff8bd5f64181fb7b68d7f9e700cc Mon Sep 17 00:00:00 2001 From: Adina Wagner Date: Thu, 4 Jul 2024 15:38:26 +0200 Subject: [PATCH 4/6] add version note about required git-annex --- docs/basics/101-139-s3.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/basics/101-139-s3.rst b/docs/basics/101-139-s3.rst index 91bacc01f..3455f5be7 100644 --- a/docs/basics/101-139-s3.rst +++ b/docs/basics/101-139-s3.rst @@ -3,6 +3,12 @@ Walk-through: Amazon S3 as a special remote ------------------------------------------- +.. importantnote:: This walk-through requires git-annex >= 10.20230802 + + Prior versions of git-annex do not support public access via the ``publicurl`` parameter with S3 buckets created after April 2023. + Find out more about this in `this discussion `_. + + `Amazon S3 `_ (or Amazon Simple Storage Service) is a popular service by `Amazon Web Services `_ (AWS) that provides object storage through a web service interface. An S3 bucket can be From c6026c2c436cc7af52a06e6ab08dff5d85eafc46 Mon Sep 17 00:00:00 2001 From: Adina Wagner Date: Thu, 4 Jul 2024 15:55:55 +0200 Subject: [PATCH 5/6] update submodule for bucket policy snapshot --- docs/artwork | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/artwork b/docs/artwork index e56f35274..b95b024a0 160000 --- a/docs/artwork +++ b/docs/artwork @@ -1 +1 @@ -Subproject commit e56f352741dadb585d147e28304cf530086e018b +Subproject commit b95b024a0410f3ab0d225b103e4a7212dfa300c0 From 4faca8c51becb667734f1ae65494b6720de7b6e3 Mon Sep 17 00:00:00 2001 From: Adina Wagner Date: Thu, 4 Jul 2024 15:57:08 +0200 Subject: [PATCH 6/6] Add bucket policy alternative --- docs/basics/101-139-s3.rst | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/docs/basics/101-139-s3.rst b/docs/basics/101-139-s3.rst index ec882c8d8..6bf6c528c 100644 --- a/docs/basics/101-139-s3.rst +++ b/docs/basics/101-139-s3.rst @@ -222,6 +222,28 @@ To make them public, switch to the "Permissions" tab in your buckets S3 console Bucket settings allow making the bucket public +Alternatively, create a bucket policy as shown below, inserting your own bucket name into the two placeholders:: + + { + "Version": "2012-10-17", + "Statement": [ + { + "Effect": "Allow", + "Principal": "*", + "Action": "s3:GetObject", + "Resource": [ + "arn:aws:s3:::YOUR-BUCKET-NAME-HERE", + "arn:aws:s3:::YOUR-BUCKET-NAME-HERE/*" + ] + } + ] + } + +.. figure:: ../artwork/src/aws_s3_bucket_policy.png + + Bucket policy to allow objects in the bucket to be retrieved by anyone. + + .. find-out-more:: Info on public buckets created prior to April 2023 Amazon S3 buckets created before April 2023 supported using ACLs for public read access to files.