Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How do I use io_chunks to let xee auto-compute a chunk size relative to the request byte limit? #166

Open
adrianom-gh opened this issue Jul 25, 2024 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@adrianom-gh
Copy link

adrianom-gh commented Jul 25, 2024

# By default, automatically optimize io_chunks.
    self.chunks = self._auto_chunks(max_dtype, request_byte_limit)
    if chunks == -1:
      self.chunks = -1
    elif chunks is not None and chunks != 'auto':
      self.chunks = self._assign_index_chunks(chunks)
  io_chunks (optional): Specifies the chunking strategy for loading data
        from EE. By default, this automatically calculates optional chunks based
        on the `request_byte_limit`.

The above is source code in xee that is supposed to automatically optimize io_chunks based on the request byte size (which is defined in the source code as approximately 48 MBs) when chunks=None. However, when I set chunks=None when running xarray.open_dataset I get back an xarray Dataset that is not chunked at all. What am I missing here?

@naschmitz naschmitz added the question Further information is requested label Sep 17, 2024
@naschmitz
Copy link
Collaborator

Could you add a code sample to reproduce? chunks=None should automatically set an appropriate chunk size.

@adrianom-gh
Copy link
Author

adrianom-gh commented Oct 21, 2024

i = ee.ImageCollection(ee.Image("LANDSAT/LC08/C02/T1_TOA/LC08_044034_20140318"))
ds = xr.open_dataset(i, engine='ee', chunks=None)
print(ds['B1'])

<xarray.DataArray 'B1' (time: 1, lon: 360, lat: 180)> Size: 259kB
[64800 values with dtype=float32]
Coordinates:
  * time     (time) datetime64[ns] 8B 2014-03-18T18:46:32.053000
  * lon      (lon) float64 3kB -179.5 -178.5 -177.5 -176.5 ... 177.5 178.5 179.5
  * lat      (lat) float64 1kB -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
Attributes:
    id:             B1
    data_type:      {'type': 'PixelType', 'precision': 'float'}
    dimensions:     [7661, 7801]
    crs:            EPSG:4326
    crs_transform:  [30, 0, 460785, 0, -30, 4264215]

I feel like I'm missing something obvious here, but just taking this sample code snippet from the xee readme and printing ds shows that the data variables are just xarray DataArrays and not Dask arrays. By putting in chunks=None, it should have auto-computed a chunk size and chunked the dataset, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants