Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace EarthAccessFile with classes inheriting from specific AbstractBufferedFile subclasses #828

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

itcarroll
Copy link
Collaborator

@itcarroll itcarroll commented Sep 30, 2024

Fixes #610

This PR provides an "explicit is better than implicit" solution to fixing the method resolution order (MRO) on the (now former) EarthAccessFile (#610). The bug was that EarthAccessFile tried to proxy both s3fs.S3File and fsspec.implmentations.http.HTTPFile, but failed to proxy those methods also defined on the superclass fsspec.spec.AbstractBufferedFile from which all three inherited. My first attempted fix (PR #620) removed fsspec.spec.AbstractBufferedFile from the MRO of EarthAccessFile. While this did proxy methods correctly, usage was broadly broken because each EarthAccessFile was no longer an instance of the file-like object proxied (which was important, in particular for xarray.open_dataset!).

Why proxy at all? The file-like objects that earthaccess.open provides must be pickleable AND the un-pickling process needs to be able to change the particular type of file-like object used, an essential part of the hand-off of file-like objects between xarray running locally and dask workers in the cloud.

What this PR does:

  • replaces class EarthAccessFile(fsspec.spec.AbstractBufferedFile) with a suite of (currently three) classes that could be returned by fs.open where fs is either an S3FileSystem or an HTTPFileSystem.
    • EarthaccessS3File(EarthaccessMixin, S3File)
    • EarthaccessHTTPFile(EarthaccessMixin, HTTPFile)
    • EarthaccessHTTPStreamFile(EarthaccessMixin, HTTPStreamFile)
  • adds the EarthaccessMixin to implement the __reduce__ method for custom pickling
  • replace the logic of the __reduce__ callable, the make_instance function, to avoid "double pickling" (no more dumps)

NOTE: Because #620 was merged, the "Files changed" in this PR are misleading relative to the original implementation. The best commit to reference for comparison is from right before the merge.

Still in progress:

  • documentation (both user documentation and code comments)
  • the new tests need to be integration tests rather than unit tests (i think) to use earthdata authentication
Pull Request (PR) draft checklist - click to expand
  • Please review our
    contributing documentation
    before getting started.
  • Ensure an issue exists representing the problem being solved in this PR.
  • Populate a descriptive title. For example, instead of "Updated README.md", use a
    title such as "Add testing details to the contributor section of the README".
    Example PRs: #763
  • Populate the body of the pull request with:
  • Update CHANGELOG.md with details about your change in a section titled
    ## Unreleased. If such a section does not exist, please create one. Follow
    Common Changelog for your additions.
    Example PRs: #763
  • Update the documentation and/or the README.md with details of changes to the
    earthaccess interface, if any. Consider new environment variables, function names,
    decorators, etc.

Click the "Ready for review" button at the bottom of the "Conversation" tab in GitHub
once these requirements are fulfilled. Don't worry if you see any test failures in
GitHub at this point!

Pull Request (PR) merge checklist - click to expand

Please do your best to complete these requirements! If you need help with any of these
requirements, you can ping the @nsidc/earthaccess-support team in a comment and we
will help you out!

  • Add unit tests for any new features.
  • Apply formatting and linting autofixes. You can add a GitHub comment in this Pull
    Request containing "pre-commit.ci autofix" to automate this.
  • Ensure all automated PR checks (seen at the bottom of the "conversation" tab) pass.
  • Get at least one approving review.

📚 Documentation preview 📚: https://earthaccess--828.org.readthedocs.build/en/828/

@itcarroll itcarroll changed the title Earthaccess file Replace EarthAccessFile with a suite of classes that inherit from specific AbstractBufferedFile subclasses Sep 30, 2024
@itcarroll itcarroll changed the title Replace EarthAccessFile with a suite of classes that inherit from specific AbstractBufferedFile subclasses Replace EarthAccessFile with classes inheriting from specific AbstractBufferedFile subclasses Sep 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

method resolution surprise on EarthAccessFile
1 participant