-
-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Add download utils to allow data download from prem and AWS s3 buckets #489
Conversation
This should do the trick for the CI and dependency handling: zoghbi-a#1 |
Codecov Report
@@ Coverage Diff @@
## main #489 +/- ##
==========================================
+ Coverage 80.07% 80.09% +0.01%
==========================================
Files 52 53 +1
Lines 6059 6185 +126
==========================================
+ Hits 4852 4954 +102
- Misses 1207 1231 +24
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
pyvo/utils/tests/test_download.py
Outdated
with pytest.warns(PyvoUserWarning): | ||
http_download('http://example.com/data/basic.xml', | ||
local_filepath='basic.xml', cache=True) | ||
assert os.path.getsize('basic.xml') == 901 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose some content related test is needed instead of size, to make windows happy.
888476d
to
ef906c9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs review
docs/utils/download.rst
Outdated
Example Usage | ||
============== | ||
```python | ||
data_url = 'https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/2/3052/primary/acisf03052N004_cntr_img2.jpg' | ||
image_file = http_download(url=data_url) | ||
|
||
s3_uri = 's3://nasa-heasarc/chandra/data/byobsid/2/3052/primary/acisf03052N004_cntr_img2.jpg' | ||
image2_file = aws_download(uri=data_url) | ||
``` | ||
or | ||
```python | ||
s3_key = 'chandra/data/byobsid/2/3052/primary/acisf03052N004_cntr_img2.jpg's | ||
s3_bucket = 'nasa-heasarc' | ||
image2_file = aws_download(bucket=s3_bucket, key=s3_key) | ||
``` | ||
|
||
If the aws data requires authentication, a credential profile (e.g. `aws_user` profile in ``~/.aws/credentials``) can be passed | ||
```python | ||
image2_file = aws_download(bucket=s3_bucket, key=s3_key, aws_profile='aws_user') | ||
``` | ||
A session (instance of ``boto3.session.Session``) can also be passed instead (see detials in `AWS session documentation`_). | ||
```python | ||
s3_session = boto3.session.Session(aws_access_key_id, aws_secret_access_key) | ||
image2_file = aws_download(bucket=s3_bucket, key=s3_key, session=s3_session) | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at the other rst files to reformat these into rst rather than markdown, and add the >>>
, too, so they can be doctested (also add the remote-data directives)
docs/utils/download.rst
Outdated
it obtained. These can be considered an advanced version of `~pyvo.dal.Record.getdataset` that can handle | ||
data from standard on-prem servers as well as cloud data. For now only AWS is supported. | ||
|
||
There two methods with the same call signature: `http_download` and `aws_download`. The first handles |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use full API links otherwise sphinx will complain (we don't have nitpicky turned on yet, but things like this will eventually cause CI failures)
``` | ||
|
||
.. _Amazon S3 storage: https://aws.amazon.com/s3/ | ||
.. _AWS session documentation: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/core/session.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add api reference at the bottom of the page
docs/utils/download.rst
Outdated
Example Usage | ||
============== | ||
.. code-block:: python | ||
|
||
data_url = 'https://heasarc.gsfc.nasa.gov/FTP/chandra/data/byobsid/2/3052/primary/acisf03052N004_cntr_img2.jpg' | ||
image_file = http_download(url=data_url) | ||
|
||
s3_uri = 's3://nasa-heasarc/chandra/data/byobsid/2/3052/primary/acisf03052N004_cntr_img2.jpg' | ||
image2_file = aws_download(uri=data_url) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look at the other rst file for examples of how to write rst code blocks (rather than markdown). Also add empty lines after the headings, and add >>>
for the doctesting to pick these up, and the remote data directives, too.
docs/utils/download.rst
Outdated
image2_file = aws_download(bucket=s3_bucket, key=s3_key) | ||
|
||
|
||
If the aws data requires authentication, a credential profile (e.g. `aws_user` profile in ``~/.aws/credentials``) can be passed: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't use single backticks for highlight, all of those will fail to link to an URL or sphinx reference. Either double backtick, or use full sphinx resolveable reference
|
f88d04d
to
e7f5571
Compare
I think that the remaining CI failure is not related to this PR. |
68659c3
to
398be70
Compare
cd8ad8c
to
d2aaa04
Compare
d2aaa04
to
ac47c96
Compare
There are some open questions for this, so we will close it for now and revisit the issues later. |
This PR adds two utility methods:
http_download
andaws_download
. They have the same call signature and allow data download from on-prem addresses (through http urls) or from AWS s3 URI.These methods enhance what is available with
dal.Record.getdataset
for on-prem data access. When a standard for serving cloud data using the VO is adopted in the future, these util methods may be absorbed intodal.Record.getdataset
.For relevant discussion on cloud data, see PR #369.
The current PR attempts to simplify #369.