Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zarr Python 3 tracking issue #514

Open
4 tasks
jhamman opened this issue Oct 6, 2024 · 1 comment
Open
4 tasks

Zarr Python 3 tracking issue #514

jhamman opened this issue Oct 6, 2024 · 1 comment

Comments

@jhamman
Copy link
Contributor

jhamman commented Oct 6, 2024

Zarr-Python 3.0 is getting close to a full release. This issue tracks the integration of the 3.0 release with Kerchunk.

Here's a running list of issues we expect we need to solve either here or upstream:

  • Develop an abstraction over Zarr Python's store interface (Zarr-Python 3.0's store is no longer a mutable mapping)
  • Develop a reference-filesystem like store for Zarr

Eventually, we may also want to:

  • construct Zarr's metadata classes directly rather than using Zarr's top-level API w/ memory stores
  • support writing v3 metadata (Kerchunk and Zarr V3 #235 is the issue for that)

xref: #504

@mpiannucci
Copy link
Contributor

mpiannucci commented Oct 23, 2024

I have been working on the zarr python support in kerchunk in #516, but i need to cycle off of it for a bit. So I am going to list out what is left to be done here and how it can be split off as best I can in hopes that someone else can help me to move this along in the meantime:

Heres where the functionality is at in that PR.

Generating references:

  • HDF
  • netcdf3
  • grib2
  • zarr (not tested, might work)
  • tiff (not tested, might work)
  • others (not sure)

Reading with xarray

Currently creating a store works, however there are caveats:

  • fsspec ReferenceFilesystem async cat_file is broken Fix broken async reference file system _cat_file method filesystem_spec#1734
  • zarr 3 RemoteStore issues: zarr python 3 requires that an fsspec filesystem used for a RemoteStore is an AsyncFilesystem. ReferenceFilesystem supports this, however this also means that the remote filesystem within the ReferenceFilesystem must be async. This means that all usage where data files are on a non async filesystem will not work. This blocks most of the tests in kerchunk as it stands today because LocalFilesystem is not async

Codecs

Codecs (filters, compressors) are treated differently in zarr 3. zarr 2 stores read with zarr 3 will still use numcodecs codecs, however if users want to ever use these codecs with another store (say to read a grib virtual dataset with Icechunk) the codecs from this package need to conform to zarr's Codec abc

  • Codecs support zarr 3 Codec API

I think that this is mostly it, and im happy to help anyone who is interested in helping drive this forward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants