Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error trying to load local zarr dataset #634

Open
dstansby opened this issue Sep 19, 2024 · 10 comments
Open

Error trying to load local zarr dataset #634

dstansby opened this issue Sep 19, 2024 · 10 comments
Labels
bug The code is not performing according to the design or a design flaw is seriously impacting users. zarr zarr format related.

Comments

@dstansby
Copy link

from cloudvolume import CloudVolume
vol = CloudVolume('zarr://file:///Users/dstansby/notebooks/hipct/meshgen/4')

gives me

---------------------------------------------------------------------------
UnsupportedProtocolError                  Traceback (most recent call last)
File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/cloudvolume.py:245, in CloudVolume.__new__(cls, cloudpath, mip, bounded, autocrop, fill_missing, cache, compress_cache, cdn_cache, progress, info, provenance, compress, compress_level, non_aligned_writes, parallel, delete_black_uploads, background_color, green_threads, use_https, max_redirects, mesh_dir, skel_dir, agglomerate, secrets, spatial_index_db, lru_bytes, cache_locking)
    244 try:
--> 245   return init('zarr2://' + cloudpath)
    246 except:

File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/cloudvolume.py:232, in CloudVolume.__new__.<locals>.init(cloudpath)
    231 def init(cloudpath):
--> 232   path = strict_extract(cloudpath)
    233   if path.format in REGISTERED_PLUGINS:

File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/paths.py:113, in strict_extract(cloudpath, windows, disable_toabs)
    106 """
    107 Same as cloudvolume.paths.extract, but raise an additional 
    108 cloudvolume.exceptions.UnsupportedProtocolError
   (...)
    111 Returns: ExtractedPath
    112 """
--> 113 path = extract(cloudpath, windows, disable_toabs)
    115 if path.dataset == '' or path.layer == '':

File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/paths.py:155, in extract(cloudpath, windows, disable_toabs)
    153   abspath = toabs    
--> 155 fmt, protocol, cloudpath = extract_format_protocol(cloudpath)
    157 split_char = '/'

File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/paths.py:89, in extract_format_protocol(cloudpath)
     88 if proto in ALLOWED_FORMATS:
---> 89   raise error # e.g. gs://graphene://
     90 elif proto in ALLOWED_PROTOCOLS:

UnsupportedProtocolError: 
Cloud Path must conform to format://PROTOCOL://BUCKET/PATH
Examples: 
  precomputed://gs://test_bucket/em
  gs://test_bucket/em
  graphene://https://example.com/image/em

Supported Formats: None (precomputed), graphene, precomputed, boss, n5, zarr, zarr2, zarr3
Supported Protocols: gs, file, s3, http, https, mem, middleauth+https, ngauth+https, matrix, tigerdata

Cloud Path Recieved: zarr2://zarr://file:///Users/dstansby/notebooks/hipct/meshgen/4


During handling of the above exception, another exception occurred:

InfoUnavailableError                      Traceback (most recent call last)
Cell In[13], line 1
----> 1 vol = CloudVolume('zarr://file:///Users/dstansby/notebooks/hipct/meshgen/4')

File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/cloudvolume.py:247, in CloudVolume.__new__(cls, cloudpath, mip, bounded, autocrop, fill_missing, cache, compress_cache, cdn_cache, progress, info, provenance, compress, compress_level, non_aligned_writes, parallel, delete_black_uploads, background_color, green_threads, use_https, max_redirects, mesh_dir, skel_dir, agglomerate, secrets, spatial_index_db, lru_bytes, cache_locking)
    245     return init('zarr2://' + cloudpath)
    246   except:
--> 247     raise err
    248 else:
    249   raise err

File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/cloudvolume.py:241, in CloudVolume.__new__(cls, cloudpath, mip, bounded, autocrop, fill_missing, cache, compress_cache, cdn_cache, progress, info, provenance, compress, compress_level, non_aligned_writes, parallel, delete_black_uploads, background_color, green_threads, use_https, max_redirects, mesh_dir, skel_dir, agglomerate, secrets, spatial_index_db, lru_bytes, cache_locking)
    236     raise UnsupportedFormatError(
    237       "Unknown format {}".format(path.format)
    238     )
    240 try:
--> 241   return init(cloudpath)
    242 except InfoUnavailableError as err:
    243   if 'precomputed://' not in cloudpath:

File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/cloudvolume.py:234, in CloudVolume.__new__.<locals>.init(cloudpath)
    232 path = strict_extract(cloudpath)
    233 if path.format in REGISTERED_PLUGINS:
--> 234   return REGISTERED_PLUGINS[path.format](**kwargs)
    235 else:
    236   raise UnsupportedFormatError(
    237     "Unknown format {}".format(path.format)
    238   )

File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/datasource/zarr/__init__.py:47, in create_zarr(cloudpath, mip, bounded, autocrop, fill_missing, cache, compress_cache, cdn_cache, progress, info, compress, compress_level, non_aligned_writes, delete_black_uploads, parallel, green_threads, secrets, cache_locking, **kwargs)
     28 config = SharedConfiguration(
     29   cdn_cache=cdn_cache,
     30   compress=compress,
   (...)
     38   cache_locking=cache_locking,
     39 )
     40 cache = CacheService(
     41   cloudpath=get_cache_path(cache, cloudpath),
     42   enabled=bool(cache),
     43   config=config,
     44   compress=compress_cache,
     45 )
---> 47 meta = ZarrMetadata(cloudpath, config=config, cache=cache, info=info)
     48 imagesrc = ZarrImageSource(
     49   config, meta, cache, 
     50   autocrop=bool(autocrop),
   (...)
     53   fill_missing=bool(fill_missing),
     54 )
     56 return CloudVolumePrecomputed(
     57   meta, cache, config, 
     58   imagesrc, mesh=None, skeleton=None,
     59   mip=mip
     60 )

File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/datasource/zarr/metadata.py:50, in ZarrMetadata.__init__(self, cloudpath, config, cache, info)
     47 self.zattrs = self.default_zattrs()
     49 if orig_info is None:
---> 50   self.info = self.fetch_info()
     51 else:
     52   self.render_zarr_metadata()

File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/datasource/zarr/metadata.py:427, in ZarrMetadata.fetch_info(self)
    424 if res is not None:
    425   self.zarrays.extend(res)
--> 427 return self.zarr_to_info(self.zarrays, self.zattrs)

File ~/notebooks/hipct/meshgen/.venv/lib/python3.12/site-packages/cloudvolume/datasource/zarr/metadata.py:368, in ZarrMetadata.zarr_to_info(self, zarrays, zattrs)
    363 num_channels = len([ 
    364   chan for chan in zattrs["omero"]["channels"] if chan["active"] 
    365 ])
    367 if not zarrays:
--> 368   raise exceptions.InfoUnavailableError()
    370 base_res = self.spatial_resolution_in_nm(0, zattrs, zarrays)
    372 info = PrecomputedMetadata.create_info(
    373   num_channels=num_channels,
    374   layer_type='image',
   (...)
    380   chunk_size=zarrays[0]["chunks"][2:][::-1],
    381 )

InfoUnavailableError:

Somewhere in that message it says Cloud Path Recieved: zarr2://zarr://file:///Users/dstansby/notebooks/hipct/meshgen/4, which doesn't seem right - there's two protocol identifiers on the front (zarr and zarr2), despite the fact I only passed zarr in the CloudVolume call.

@william-silversmith william-silversmith added bug The code is not performing according to the design or a design flaw is seriously impacting users. labels Sep 19, 2024
@william-silversmith
Copy link
Contributor

Dang, that's pretty weird! I'll have to fix that. Thanks for reporting.

@william-silversmith
Copy link
Contributor

I looked into this a bit more, and while the error is confusing, what is happening is that there is a missing .zarray file so it's trying different things.

@dstansby
Copy link
Author

I'm pretty sure there is a .zarray file:

tree /Users/dstansby/notebooks/hipct/meshgen/4 -a
├── .zarray
├── 0
│   ├── 0
│   │   ├── 0
│   │   ├── 1
│   │   └── 2
│   └── 1
│       ├── 0
│       ├── 1
│       └── 2
└── 1
    ├── 0
    │   ├── 0
    │   ├── 1
    │   └── 2
    └── 1
        ├── 0
        ├── 1
        └── 2

Will try and put together a simple reproducible example (that includes writing some data first) when I have a bit more time.

@william-silversmith
Copy link
Contributor

I may be mistaken in my implementation, but in the examples I investigated, there was a .zarray file under each mip level.

For example:

helloworld
    |- .zattrs
    |- 0
    |  |- .zarray
    |- 1
       |- .zarray

@dstansby
Copy link
Author

Ah this is a single zarr array, not a zarr group (which would have another level of directories with .zarray files), so maybe that's the issue?

@william-silversmith william-silversmith added the zarr zarr format related. label Sep 23, 2024
@william-silversmith
Copy link
Contributor

Could you send me a zip file containing a small example? I could then debug it more efficiently.

@william-silversmith
Copy link
Contributor

There's been a release of a new cloudvolume with improved Zarr support. Consider giving it a try!

@dstansby
Copy link
Author

Hmm, still no luck. Here's a self-contained example that creates an array, then trys to load it:

from cloudvolume import CloudVolume
import zarr

arr = zarr.open("test.zarr", "w", shape=(10, 10, 10), dtype="u4")
arr[:] = 2

vol = CloudVolume('zarr://file://test.zarr')
---------------------------------------------------------------------------
InfoUnavailableError                      Traceback (most recent call last)
Cell In[7], line 1
----> 1 vol = CloudVolume('zarr://file://test.zarr')

File ~/Data/heart-meshing/.venv/lib/python3.12/site-packages/cloudvolume/cloudvolume.py:255, in CloudVolume.__new__(cls, cloudpath, mip, bounded, autocrop, fill_missing, cache, compress_cache, cdn_cache, progress, info, provenance, compress, compress_level, non_aligned_writes, parallel, delete_black_uploads, background_color, green_threads, use_https, max_redirects, mesh_dir, skel_dir, agglomerate, secrets, spatial_index_db, lru_bytes, cache_locking, lru_encoding)
    253     raise err
    254 else:
--> 255   raise err

File ~/Data/heart-meshing/.venv/lib/python3.12/site-packages/cloudvolume/cloudvolume.py:247, in CloudVolume.__new__(cls, cloudpath, mip, bounded, autocrop, fill_missing, cache, compress_cache, cdn_cache, progress, info, provenance, compress, compress_level, non_aligned_writes, parallel, delete_black_uploads, background_color, green_threads, use_https, max_redirects, mesh_dir, skel_dir, agglomerate, secrets, spatial_index_db, lru_bytes, cache_locking, lru_encoding)
    242     raise UnsupportedFormatError(
    243       "Unknown format {}".format(path.format)
    244     )
    246 try:
--> 247   return init(cloudpath)
    248 except InfoUnavailableError as err:
    249   if 'precomputed://' not in cloudpath and cloudpath[:4] != 'zarr':

File ~/Data/heart-meshing/.venv/lib/python3.12/site-packages/cloudvolume/cloudvolume.py:240, in CloudVolume.__new__.<locals>.init(cloudpath)
    238 path = strict_extract(cloudpath)
    239 if path.format in REGISTERED_PLUGINS:
--> 240   return REGISTERED_PLUGINS[path.format](**kwargs)
    241 else:
    242   raise UnsupportedFormatError(
    243     "Unknown format {}".format(path.format)
    244   )

File ~/Data/heart-meshing/.venv/lib/python3.12/site-packages/cloudvolume/datasource/zarr/__init__.py:47, in create_zarr(cloudpath, mip, bounded, autocrop, fill_missing, cache, compress_cache, cdn_cache, progress, info, compress, compress_level, non_aligned_writes, delete_black_uploads, parallel, green_threads, secrets, cache_locking, **kwargs)
     28 config = SharedConfiguration(
     29   cdn_cache=cdn_cache,
     30   compress=compress,
   (...)
     38   cache_locking=cache_locking,
     39 )
     40 cache = CacheService(
     41   cloudpath=get_cache_path(cache, cloudpath),
     42   enabled=bool(cache),
     43   config=config,
     44   compress=compress_cache,
     45 )
---> 47 meta = ZarrMetadata(cloudpath, config=config, cache=cache, info=info)
     48 imagesrc = ZarrImageSource(
     49   config, meta, cache, 
     50   autocrop=bool(autocrop),
   (...)
     53   fill_missing=bool(fill_missing),
     54 )
     56 return CloudVolumePrecomputed(
     57   meta, cache, config, 
     58   imagesrc, mesh=None, skeleton=None,
     59   mip=mip
     60 )

File ~/Data/heart-meshing/.venv/lib/python3.12/site-packages/cloudvolume/datasource/zarr/metadata.py:50, in ZarrMetadata.__init__(self, cloudpath, config, cache, info)
     47 self.zattrs = self.default_zattrs()
     49 if orig_info is None:
---> 50   self.info = self.fetch_info()
     51 else:
     52   self.render_zarr_metadata()

File ~/Data/heart-meshing/.venv/lib/python3.12/site-packages/cloudvolume/datasource/zarr/metadata.py:471, in ZarrMetadata.fetch_info(self)
    468 if res is not None:
    469   self.zarrays.extend(res)
--> 471 return self.zarr_to_info(self.zarrays, self.zattrs)

File ~/Data/heart-meshing/.venv/lib/python3.12/site-packages/cloudvolume/datasource/zarr/metadata.py:397, in ZarrMetadata.zarr_to_info(self, zarrays, zattrs)
    394   num_channels = 1
    396 if not zarrays:
--> 397   raise exceptions.InfoUnavailableError()
    399 base_res = self.spatial_resolution_in_nm(0, zattrs, zarrays)
    401 def extract_chunk_size(chunk_size):

InfoUnavailableError: 

@dstansby
Copy link
Author

dstansby commented Dec 16, 2024

A bit more debugging, and this is because the Zarr array doesn't have a .zattrs file, which cloudvolume is trying (and failing to read)

@william-silversmith
Copy link
Contributor

I tried making a zarr array and was able to reproduce the problem with:

z1 = zarr.open('data/example.zarr', mode='w', shape=(10000, 10000),
... 
...                chunks=(1000, 1000), dtype='i4')

This doesn't follow the usual pattern for Neuroglancer, but it looks like Neuroglancer can visualize it, so I guess I should try to support it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The code is not performing according to the design or a design flaw is seriously impacting users. zarr zarr format related.
Projects
None yet
Development

No branches or pull requests

2 participants