Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add aws utils #187

Open
wants to merge 33 commits into
base: master
Choose a base branch
from
Open

Add aws utils #187

wants to merge 33 commits into from

Conversation

aschroed
Copy link
Member

@aschroed aschroed commented Mar 7, 2022

Adds some generally useful aws functions (currently all s3 operations) - haven't had time to complete unit tests for all functions though several functions are covered.

Perhaps similar functions may have been added to other utils packages by others?

I'd like to be able to use these functions for open data operations as I refactor some of that code. so would love to get this merged in

@aschroed aschroed requested a review from netsettler March 7, 2022 21:14
logger = logging.getLogger(__name__)


def s3_bucket_head(*, bucket_name, s3=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would reorder arguments here so s3 comes first, and rename it s3_client

:return: dict: head response or None
"""
try:
s3 = s3 or s3Utils().s3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not do this this way as it will implicitly pull in a bunch of other things when calling s3Utils() - I would just do boto3.client('s3'). This comment applies to many functions throughout here.

Comment on lines 36 to 46
def s3_bucket_exists(*, bucket_name, s3=None):
""" Does a bucket exist?

:param bucket_name: name of the bucket - string
:param s3: AWS s3 client
:return: boolean - True if exists, False if not
"""
return bool(s3_bucket_head(bucket_name=bucket_name, s3=s3))


def s3_bucket_object_count(bucket_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're using the s3 convention inconsistently - I would recommend having it everywhere and writing a small helper that does the client initialization, if necessary.

"""
bucket = boto3.resource('s3').Bucket(bucket_name)
# get only head of objects so we can count them
return sum(1 for _ in bucket.objects.all())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this work recursively or will it only count folders at top level? Thinking about how it would count things in our files/wfoutput buckets for example which all have a folder based hierarchy.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless I'm missing something s3 does not store objects with a hierarchy (folders are just a convenience in console display?) so this would count all objects in the given bucket regardless of folder. I expect if empty folders were created then these would be included in the count so maybe checking that 0 size objects are not included may be warranted? If I am correct about this then what may indeed be useful is to support passing in a prefix and only return a count of those objects that share said prefix - for example in the dcic account in the '4dn-dcic-open-data-transfer-logs' bucket if a parameter '2022-08-01' was provided then a count of the 32 object in that folder/with that shared prefix would be returned.

return bool(s3_object_head(object_key=object_key, bucket_name=bucket_name, s3=s3))


def s3_put_object(*, object_key, obj, bucket_name, acl=None, s3=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should handle the kms_key_id case here as well, despite it not being used in Fourfront

@coveralls
Copy link

coveralls commented Oct 17, 2022

Pull Request Test Coverage Report for Build 3275617373

  • 49 of 134 (36.57%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.5%) to 72.461%

Changes Missing Coverage Covered Lines Changed/Added Lines %
dcicutils/bucket_utils.py 36 121 29.75%
Totals Coverage Status
Change from base Build 3198667051: -0.5%
Covered Lines: 5736
Relevant Lines: 7916

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants