From e08438b72ce18cae21e70f24c2416f04ee516f43 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 3 Apr 2022 20:36:07 -0600 Subject: [PATCH 01/92] preparing for v1.6.0 release --- Changelog | 4 ++-- README.md | 8 ++++++++ 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/Changelog b/Changelog index f95f8edae..92968f30c 100644 --- a/Changelog +++ b/Changelog @@ -1,5 +1,5 @@ - version 1.6.0 (not yet released) -================================= + version 1.6.0 (tag v1.6.0rel) +============================== * add support for new quantization functionality in netcdf-c 4.8.2 via "signficant_digits" and "quantize_mode" kwargs in Dataset.createVariable. Default quantization_mode is "BitGroom", but alternate methods "BitRound" and GranularBitRound" also supported. diff --git a/README.md b/README.md index d6f31974f..c569dacc5 100644 --- a/README.md +++ b/README.md @@ -10,6 +10,14 @@ ## News For details on the latest updates, see the [Changelog](https://github.com/Unidata/netcdf4-python/blob/master/Changelog). +??/??/2022: Version [1.6.0](https://pypi.python.org/pypi/netCDF4/1.6.0) released. Support +for quantization (bit-grooming and bit-rounding) functionality in netcdf-c 4.9.0 which can +dramatically improve compression. Dataset.createVariable now accepts dimension instances (instead +of just dimension names). 'compression' kwarg added to Dataset.createVariable (in preparation for +the available of new compression algorithms in netcdf-c (such as zstd). Currently only 'zlib' +supported. Opening a Dataset in 'append' mode now creates one if it doesn't already exist (just +like python open). + 10/31/2021: Version [1.5.8](https://pypi.python.org/pypi/netCDF4/1.5.8) released. Fix Enum bug, add binary wheels for aarch64 and python 3.10. 6/22/2021: Version [1.5.7](https://pypi.python.org/pypi/netCDF4/1.5.7) released. From e91599d20347e01e81c9c49dce0b92fb7029b846 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 3 Apr 2022 20:37:06 -0600 Subject: [PATCH 02/92] update --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index c569dacc5..487364adc 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,7 @@ dramatically improve compression. Dataset.createVariable now accepts dimension of just dimension names). 'compression' kwarg added to Dataset.createVariable (in preparation for the available of new compression algorithms in netcdf-c (such as zstd). Currently only 'zlib' supported. Opening a Dataset in 'append' mode now creates one if it doesn't already exist (just -like python open). +like python open). Working arm64 and universal2 wheels for Apple Silicon now available on pypi. 10/31/2021: Version [1.5.8](https://pypi.python.org/pypi/netCDF4/1.5.8) released. Fix Enum bug, add binary wheels for aarch64 and python 3.10. From 890432c446f87510d2ec8ffa209e21832149122a Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 3 Apr 2022 20:42:20 -0600 Subject: [PATCH 03/92] update --- README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 487364adc..c8eedd52b 100644 --- a/README.md +++ b/README.md @@ -14,8 +14,9 @@ For details on the latest updates, see the [Changelog](https://github.com/Unidat for quantization (bit-grooming and bit-rounding) functionality in netcdf-c 4.9.0 which can dramatically improve compression. Dataset.createVariable now accepts dimension instances (instead of just dimension names). 'compression' kwarg added to Dataset.createVariable (in preparation for -the available of new compression algorithms in netcdf-c (such as zstd). Currently only 'zlib' -supported. Opening a Dataset in 'append' mode now creates one if it doesn't already exist (just +the available of new compression algorithms, such as + [zstd](https://github.com/facebook/zstd), in netcdf-c). Currently only 'zlib' supported. +Opening a Dataset in 'append' mode now creates one if it doesn't already exist (just like python open). Working arm64 and universal2 wheels for Apple Silicon now available on pypi. 10/31/2021: Version [1.5.8](https://pypi.python.org/pypi/netCDF4/1.5.8) released. Fix Enum bug, add binary wheels for aarch64 and python 3.10. From 216f7bf1e7d8f91f771e4e19fc5d1aa943fe1d09 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 3 Apr 2022 20:44:39 -0600 Subject: [PATCH 04/92] update --- README.md | 2 +- docs/index.html | 526 ++++++++++++++++++++++-------------------------- 2 files changed, 238 insertions(+), 290 deletions(-) diff --git a/README.md b/README.md index c8eedd52b..b348c8f88 100644 --- a/README.md +++ b/README.md @@ -17,7 +17,7 @@ of just dimension names). 'compression' kwarg added to Dataset.createVariable (i the available of new compression algorithms, such as [zstd](https://github.com/facebook/zstd), in netcdf-c). Currently only 'zlib' supported. Opening a Dataset in 'append' mode now creates one if it doesn't already exist (just -like python open). Working arm64 and universal2 wheels for Apple Silicon now available on pypi. +like python open). Working arm64 wheels for Apple M1 Silicon now available on pypi. 10/31/2021: Version [1.5.8](https://pypi.python.org/pypi/netCDF4/1.5.8) released. Fix Enum bug, add binary wheels for aarch64 and python 3.10. diff --git a/docs/index.html b/docs/index.html index 64f10938b..8213f510f 100644 --- a/docs/index.html +++ b/docs/index.html @@ -3,24 +3,24 @@ - + netCDF4 API documentation + - - - - - - - -

@@ -475,7 +474,7 @@

Introduction

and should be familiar to users of that module.

Most new features of netCDF 4 are implemented, such as multiple -unlimited dimensions, groups and data compression. All the new +unlimited dimensions, groups and zlib data compression. All the new numeric data types (such as 64 bit and unsigned integer types) are implemented. Compound (struct), variable length (vlen) and enumerated (enum) data types are supported, but not the opaque data type. @@ -577,7 +576,7 @@

Creating/Opening/Closing a netCDF

Here's an example:

-
>>> from netCDF4 import Dataset
+
>>> from netCDF4 import Dataset
 >>> rootgrp = Dataset("test.nc", "w", format="NETCDF4")
 >>> print(rootgrp.data_model)
 NETCDF4
@@ -606,7 +605,7 @@ 

Groups in a netCDF file

NETCDF4 formatted files support Groups, if you try to create a Group in a netCDF 3 file you will get an error message.

-
>>> rootgrp = Dataset("test.nc", "a")
+
>>> rootgrp = Dataset("test.nc", "a")
 >>> fcstgrp = rootgrp.createGroup("forecasts")
 >>> analgrp = rootgrp.createGroup("analyses")
 >>> print(rootgrp.groups)
@@ -630,7 +629,7 @@ 

Groups in a netCDF file

that group. To simplify the creation of nested groups, you can use a unix-like path as an argument to Dataset.createGroup.

-
>>> fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
+
>>> fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
 >>> fcstgrp2 = rootgrp.createGroup("/forecasts/model2")
 
@@ -644,7 +643,7 @@

Groups in a netCDF file

to walk the directory tree. Note that printing the Dataset or Group object yields summary information about it's contents.

-
>>> def walktree(top):
+
>>> def walktree(top):
 ...     yield top.groups.values()
 ...     for value in top.groups.values():
 ...         yield from walktree(value)
@@ -694,7 +693,7 @@ 

Dimensions in a netCDF file

dimension is a new netCDF 4 feature, in netCDF 3 files there may be only one, and it must be the first (leftmost) dimension of the variable.

-
>>> level = rootgrp.createDimension("level", None)
+
>>> level = rootgrp.createDimension("level", None)
 >>> time = rootgrp.createDimension("time", None)
 >>> lat = rootgrp.createDimension("lat", 73)
 >>> lon = rootgrp.createDimension("lon", 144)
@@ -702,7 +701,7 @@ 

Dimensions in a netCDF file

All of the Dimension instances are stored in a python dictionary.

-
>>> print(rootgrp.dimensions)
+
>>> print(rootgrp.dimensions)
 {'level': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0, 'time': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0, 'lat': <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73, 'lon': <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144}
 
@@ -711,7 +710,7 @@

Dimensions in a netCDF file

Dimension.isunlimited method of a Dimension instance be used to determine if the dimensions is unlimited, or appendable.

-
>>> print(len(lon))
+
>>> print(len(lon))
 144
 >>> print(lon.isunlimited())
 False
@@ -723,7 +722,7 @@ 

Dimensions in a netCDF file

provides useful summary info, including the name and length of the dimension, and whether it is unlimited.

-
>>> for dimobj in rootgrp.dimensions.values():
+
>>> for dimobj in rootgrp.dimensions.values():
 ...     print(dimobj)
 <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0
 <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
@@ -768,7 +767,7 @@ 

Variables in a netCDF file

method returns an instance of the Variable class whose methods can be used later to access and set variable data and attributes.

-
>>> times = rootgrp.createVariable("time","f8",("time",))
+
>>> times = rootgrp.createVariable("time","f8",("time",))
 >>> levels = rootgrp.createVariable("level","i4",("level",))
 >>> latitudes = rootgrp.createVariable("lat","f4",("lat",))
 >>> longitudes = rootgrp.createVariable("lon","f4",("lon",))
@@ -780,7 +779,7 @@ 

Variables in a netCDF file

To get summary info on a Variable instance in an interactive session, just print it.

-
>>> print(temp)
+
>>> print(temp)
 <class 'netCDF4._netCDF4.Variable'>
 float32 temp(time, level, lat, lon)
     units: K
@@ -791,7 +790,7 @@ 

Variables in a netCDF file

You can use a path to create a Variable inside a hierarchy of groups.

-
>>> ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))
+
>>> ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))
 

If the intermediate groups do not yet exist, they will be created.

@@ -799,7 +798,7 @@

Variables in a netCDF file

You can also query a Dataset or Group instance directly to obtain Group or Variable instances using paths.

-
>>> print(rootgrp["/forecasts/model1"])  # a Group instance
+
>>> print(rootgrp["/forecasts/model1"])  # a Group instance
 <class 'netCDF4._netCDF4.Group'>
 group /forecasts/model1:
     dimensions(sizes): 
@@ -817,7 +816,7 @@ 

Variables in a netCDF file

All of the variables in the Dataset or Group are stored in a Python dictionary, in the same way as the dimensions:

-
>>> print(rootgrp.variables)
+
>>> print(rootgrp.variables)
 {'time': <class 'netCDF4._netCDF4.Variable'>
 float64 time(time)
 unlimited dimensions: time
@@ -860,7 +859,7 @@ 

Attributes in a netCDF file

variables. Attributes can be strings, numbers or sequences. Returning to our example,

-
>>> import time
+
>>> import time
 >>> rootgrp.description = "bogus example script"
 >>> rootgrp.history = "Created " + time.ctime(time.time())
 >>> rootgrp.source = "netCDF4 python module tutorial"
@@ -878,7 +877,7 @@ 

Attributes in a netCDF file

built-in dir Python function will return a bunch of private methods and attributes that cannot (or should not) be modified by the user.

-
>>> for name in rootgrp.ncattrs():
+
>>> for name in rootgrp.ncattrs():
 ...     print("Global attr {} = {}".format(name, getattr(rootgrp, name)))
 Global attr description = bogus example script
 Global attr history = Created Mon Jul  8 14:19:41 2019
@@ -889,7 +888,7 @@ 

Attributes in a netCDF file

instance provides all the netCDF attribute name/value pairs in a python dictionary:

-
>>> print(rootgrp.__dict__)
+
>>> print(rootgrp.__dict__)
 {'description': 'bogus example script', 'history': 'Created Mon Jul  8 14:19:41 2019', 'source': 'netCDF4 python module tutorial'}
 
@@ -902,7 +901,7 @@

Writing data

Now that you have a netCDF Variable instance, how do you put data into it? You can just treat it like an array and assign data to a slice.

-
>>> import numpy as np
+
>>> import numpy as np
 >>> lats =  np.arange(-90,91,2.5)
 >>> lons =  np.arange(-180,180,2.5)
 >>> latitudes[:] = lats
@@ -922,7 +921,7 @@ 

Writing data objects with unlimited dimensions will grow along those dimensions if you assign data outside the currently defined range of indices.

-
>>> # append along two unlimited dimensions by assigning to slice.
+
>>> # append along two unlimited dimensions by assigning to slice.
 >>> nlats = len(rootgrp.dimensions["lat"])
 >>> nlons = len(rootgrp.dimensions["lon"])
 >>> print("temp shape before adding data = {}".format(temp.shape))
@@ -942,7 +941,7 @@ 

Writing data along the level dimension of the variable temp, even though no data has yet been assigned to levels.

-
>>> # now, assign data to levels dimension variable.
+
>>> # now, assign data to levels dimension variable.
 >>> levels[:] =  [1000.,850.,700.,500.,300.,250.,200.,150.,100.,50.]
 
@@ -955,7 +954,7 @@

Writing data allowed, and these indices work independently along each dimension (similar to the way vector subscripts work in fortran). This means that

-
>>> temp[0, 0, [0,1,2,3], [0,1,2,3]].shape
+
>>> temp[0, 0, [0,1,2,3], [0,1,2,3]].shape
 (4, 4)
 
@@ -973,14 +972,14 @@

Writing data

For example,

-
>>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]
+
>>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]
 

will extract time indices 0,2 and 4, pressure levels 850, 500 and 200 hPa, all Northern Hemisphere latitudes and Eastern Hemisphere longitudes, resulting in a numpy array of shape (3, 3, 36, 71).

-
>>> print("shape of fancy temp slice = {}".format(tempdat.shape))
+
>>> print("shape of fancy temp slice = {}".format(tempdat.shape))
 shape of fancy temp slice = (3, 3, 36, 71)
 
@@ -1013,7 +1012,7 @@

Dealing with time coordinates

provided by cftime to do just that. Here's an example of how they can be used:

-
>>> # fill in times.
+
>>> # fill in times.
 >>> from datetime import datetime, timedelta
 >>> from cftime import num2date, date2num
 >>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
@@ -1053,7 +1052,7 @@ 

Reading data from a multi NETCDF4_CLASSIC format (NETCDF4 formatted multi-file datasets are not supported).

-
>>> for nf in range(10):
+
>>> for nf in range(10):
 ...     with Dataset("mftest%s.nc" % nf, "w", format="NETCDF4_CLASSIC") as f:
 ...         _ = f.createDimension("x",None)
 ...         x = f.createVariable("x","i",("x",))
@@ -1062,7 +1061,7 @@ 

Reading data from a multi

Now read all the files back in at once with MFDataset

-
>>> from netCDF4 import MFDataset
+
>>> from netCDF4 import MFDataset
 >>> f = MFDataset("mftest*nc")
 >>> print(f.variables["x"][:])
 [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
@@ -1079,9 +1078,9 @@ 

Efficient compression of netC

Data stored in netCDF 4 Variable objects can be compressed and decompressed on the fly. The parameters for the compression are -determined by the compression, complevel and shuffle keyword arguments +determined by the zlib, complevel and shuffle keyword arguments to the Dataset.createVariable method. To turn on -compression, set compression=zlib. The complevel keyword regulates the +compression, set zlib=True. The complevel keyword regulates the speed and efficiency of the compression (1 being fastest, but lowest compression ratio, 9 being slowest but best compression ratio). The default value of complevel is 4. Setting shuffle=False will turn @@ -1101,7 +1100,7 @@

Efficient compression of netC

If your data only has a certain number of digits of precision (say for example, it is temperature data that was measured with a precision of -0.1 degrees), you can dramatically improve compression by +0.1 degrees), you can dramatically improve zlib compression by quantizing (or truncating) the data. There are two methods supplied for doing this. You can use the least_significant_digit keyword argument to Dataset.createVariable to specify @@ -1124,22 +1123,22 @@

Efficient compression of netC

In our example, try replacing the line

-
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
+
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
 

with

-
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib')
+
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),zlib=True)
 

and then

-
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',least_significant_digit=3)
+
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),zlib=True,least_significant_digit=3)
 

or with netcdf-c >= 4.8.2

-
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',significant_digits=4)
+
>>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),zlib=True,significant_digits=4)
 

and see how much smaller the resulting files are.

@@ -1160,7 +1159,7 @@

Beyond ho Since there is no native complex data type in netcdf, compound types are handy for storing numpy complex arrays. Here's an example:

-
>>> f = Dataset("complex.nc","w")
+
>>> f = Dataset("complex.nc","w")
 >>> size = 3 # length of 1-d complex array
 >>> # create sample complex data.
 >>> datac = np.exp(1j*(1.+np.linspace(0, np.pi, size)))
@@ -1196,7 +1195,7 @@ 

Beyond ho in a Python dictionary, just like variables and dimensions. As always, printing objects gives useful summary information in an interactive session:

-
>>> print(f)
+
>>> print(f)
 <class 'netCDF4._netCDF4.Dataset'>
 root group (NETCDF4 data model, file format HDF5):
     dimensions(sizes): x_dim(3)
@@ -1221,7 +1220,7 @@ 

Variable-length (vlen) data types

data type, use the Dataset.createVLType method method of a Dataset or Group instance.

-
>>> f = Dataset("tst_vlen.nc","w")
+
>>> f = Dataset("tst_vlen.nc","w")
 >>> vlen_t = f.createVLType(np.int32, "phony_vlen")
 
@@ -1231,7 +1230,7 @@

Variable-length (vlen) data types

but compound data types cannot. A new variable can then be created using this datatype.

-
>>> x = f.createDimension("x",3)
+
>>> x = f.createDimension("x",3)
 >>> y = f.createDimension("y",4)
 >>> vlvar = f.createVariable("phony_vlen_var", vlen_t, ("y","x"))
 
@@ -1244,7 +1243,7 @@

Variable-length (vlen) data types

In this case, they contain 1-D numpy int32 arrays of random length between 1 and 10.

-
>>> import random
+
>>> import random
 >>> random.seed(54321)
 >>> data = np.empty(len(y)*len(x),object)
 >>> for n in range(len(y)*len(x)):
@@ -1284,7 +1283,7 @@ 

Variable-length (vlen) data types

with fixed length greater than 1) when calling the Dataset.createVariable method.

-
>>> z = f.createDimension("z",10)
+
>>> z = f.createDimension("z",10)
 >>> strvar = f.createVariable("strvar", str, "z")
 
@@ -1292,7 +1291,7 @@

Variable-length (vlen) data types

random lengths between 2 and 12 characters, and the data in the object array is assigned to the vlen string variable.

-
>>> chars = "1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
+
>>> chars = "1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
 >>> data = np.empty(10,"O")
 >>> for n in range(10):
 ...     stringlen = random.randint(2,12)
@@ -1331,7 +1330,7 @@ 

Enum data type

values and their names are used to define an Enum data type using Dataset.createEnumType.

-
>>> nc = Dataset('clouds.nc','w')
+
>>> nc = Dataset('clouds.nc','w')
 >>> # python dict with allowed values and their names.
 >>> enum_dict = {'Altocumulus': 7, 'Missing': 255,
 ... 'Stratus': 2, 'Clear': 0,
@@ -1349,7 +1348,7 @@ 

Enum data type

is made to write an integer value not associated with one of the specified names.

-
>>> time = nc.createDimension('time',None)
+
>>> time = nc.createDimension('time',None)
 >>> # create a 1d variable of type 'cloud_type'.
 >>> # The fill_value is set to the 'Missing' named value.
 >>> cloud_var = nc.createVariable('primary_cloud',cloud_type,'time',
@@ -1386,7 +1385,7 @@ 

Parallel IO

available. To use parallel IO, your program must be running in an MPI environment using mpi4py.

-
>>> from mpi4py import MPI
+
>>> from mpi4py import MPI
 >>> import numpy as np
 >>> from netCDF4 import Dataset
 >>> rank = MPI.COMM_WORLD.rank  # The process ID (integer 0-3 for 4-process run)
@@ -1398,7 +1397,7 @@ 

Parallel IO

when a new dataset is created or an existing dataset is opened, use the parallel keyword to enable parallel access.

-
>>> nc = Dataset('parallel_test.nc','w',parallel=True)
+
>>> nc = Dataset('parallel_test.nc','w',parallel=True)
 

The optional comm keyword may be used to specify a particular @@ -1406,7 +1405,7 @@

Parallel IO

can now write to the file indepedently. In this example the process rank is written to a different variable index on each task

-
>>> d = nc.createDimension('dim',4)
+
>>> d = nc.createDimension('dim',4)
 >>> v = nc.createVariable('var', np.int64, 'dim')
 >>> v[rank] = rank
 >>> nc.close()
@@ -1473,7 +1472,7 @@ 

Dealing with strings

stringtochar is used to convert the numpy string array to an array of characters with one more dimension. For example,

-
>>> from netCDF4 import stringtochar
+
>>> from netCDF4 import stringtochar
 >>> nc = Dataset('stringtest.nc','w',format='NETCDF4_CLASSIC')
 >>> _ = nc.createDimension('nchars',3)
 >>> _ = nc.createDimension('nstrings',None)
@@ -1506,7 +1505,7 @@ 

Dealing with strings

character array dtype under the hood when creating the netcdf compound type. Here's an example:

-
>>> nc = Dataset('compoundstring_example.nc','w')
+
>>> nc = Dataset('compoundstring_example.nc','w')
 >>> dtype = np.dtype([('observation', 'f4'),
 ...                      ('station_name','S10')])
 >>> station_data_t = nc.createCompoundType(dtype,'station_data')
@@ -1551,7 +1550,7 @@ 

In-memory (diskless) Datasets

object representing the Dataset. Below are examples illustrating both approaches.

-
>>> # create a diskless (in-memory) Dataset,
+
>>> # create a diskless (in-memory) Dataset,
 >>> # and persist the file to disk when it is closed.
 >>> nc = Dataset('diskless_example.nc','w',diskless=True,persist=True)
 >>> d = nc.createDimension('x',None)
@@ -1613,7 +1612,7 @@ 

In-memory (diskless) Datasets

the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

-

contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

+

contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

copyright: 2008 by Jeffrey Whitaker.

@@ -1626,7 +1625,7 @@

In-memory (diskless) Datasets

View Source -
# init for netCDF4. package
+            
# init for netCDF4. package
 # Docstring comes from extension module _netCDF4.
 from ._netCDF4 import *
 # Need explicit imports for names beginning with underscores
@@ -1652,7 +1651,7 @@ 

In-memory (diskless) Datasets

Dataset:
- +

A netCDF Dataset is a collection of dimensions, groups, variables and attributes. Together they describe the meaning of data and relations among data fields stored in a netCDF file. See Dataset.__init__ for more @@ -1730,7 +1729,7 @@

In-memory (diskless) Datasets

Dataset()
- +

__init__(self, filename, mode="r", clobber=True, diskless=False, persist=False, keepweakref=False, memory=None, encoding=None, parallel=False, comm=None, info=None, format='NETCDF4')

@@ -1836,7 +1835,7 @@

In-memory (diskless) Datasets

filepath(unknown):
- +

filepath(self,encoding=None)

Get the file system path (or the opendap URL) which was used to @@ -1855,7 +1854,7 @@

In-memory (diskless) Datasets

close(unknown):
- +

close(self)

Close the Dataset.

@@ -1871,7 +1870,7 @@

In-memory (diskless) Datasets

isopen(unknown):
- +

isopen(self)

Is the Dataset open or closed?

@@ -1887,7 +1886,7 @@

In-memory (diskless) Datasets

sync(unknown):
- +

sync(self)

Writes all buffered data in the Dataset to the disk file.

@@ -1903,7 +1902,7 @@

In-memory (diskless) Datasets

set_fill_on(unknown):
- +

set_fill_on(self)

Sets the fill mode for a Dataset open for writing to on.

@@ -1927,7 +1926,7 @@

In-memory (diskless) Datasets

set_fill_off(unknown):
- +

set_fill_off(self)

Sets the fill mode for a Dataset open for writing to off.

@@ -1947,7 +1946,7 @@

In-memory (diskless) Datasets

createDimension(unknown):
- +

createDimension(self, dimname, size=None)

Creates a new dimension with the given dimname and size.

@@ -1971,7 +1970,7 @@

In-memory (diskless) Datasets

renameDimension(unknown):
- +

renameDimension(self, oldname, newname)

rename a Dimension named oldname to newname.

@@ -1987,7 +1986,7 @@

In-memory (diskless) Datasets

createCompoundType(unknown):
- +

createCompoundType(self, datatype, datatype_name)

Creates a new compound data type named datatype_name from the numpy @@ -2012,7 +2011,7 @@

In-memory (diskless) Datasets

createVLType(unknown):
- +

createVLType(self, datatype, datatype_name)

Creates a new VLEN data type named datatype_name from a numpy @@ -2032,7 +2031,7 @@

In-memory (diskless) Datasets

createEnumType(unknown):
- +

createEnumType(self, datatype, datatype_name, enum_dict)

Creates a new Enum data type named datatype_name from a numpy @@ -2053,11 +2052,10 @@

In-memory (diskless) Datasets

createVariable(unknown):
- -

createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, + +

createVariable(self, varname, datatype, dimensions=(), zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, -endian='native', least_significant_digit=None, significant_digits=None, quantize_mode='BitGroom', -fill_value=None, chunk_cache=None)

+endian='native', least_significant_digit=None, significant_digits=None, fill_value=None, chunk_cache=None)

Creates a new variable with the given varname, datatype, and dimensions. If dimensions are not given, the variable is assumed to be @@ -2089,17 +2087,11 @@

In-memory (diskless) Datasets

previously using Dataset.createDimension. The default value is an empty tuple, which means the variable is a scalar.

-

If the optional keyword argument compression is set, the data will be -compressed in the netCDF file using the specified compression algorithm. -Currently only 'zlib' is supported. Default is None (no compression).

-

If the optional keyword zlib is True, the data will be compressed in -the netCDF file using zlib compression (default False). The use of this option is -deprecated in favor of compression='zlib'.

+the netCDF file using gzip compression (default False).

-

The optional keyword complevel is an integer between 0 and 9 describing -the level of compression desired (default 4). Ignored if compression=None. -A value of zero disables compression.

+

The optional keyword complevel is an integer between 1 and 9 describing +the level of compression desired (default 4). Ignored if zlib=False.

If the optional keyword shuffle is True, the HDF5 shuffle filter will be applied before compressing the data (default True). This @@ -2133,17 +2125,17 @@

In-memory (diskless) Datasets

opposite format as the one used to create the file, there may be some performance advantage to be gained by setting the endian-ness.

-

The compression, zlib, complevel, shuffle, fletcher32, contiguous, chunksizes and endian +

The zlib, complevel, shuffle, fletcher32, contiguous, chunksizes and endian keywords are silently ignored for netCDF 3 files that do not use HDF5.

The optional keyword fill_value can be used to override the default netCDF _FillValue (the value that the variable gets filled with before -any data is written to it, defaults given in the dict netCDF4.default_fillvals). +any data is written to it, defaults given in the dict netCDF4.default_fillvals). If fill_value is set to False, then the variable is not pre-filled.

If the optional keyword parameters least_significant_digit or significant_digits are specified, variable data will be truncated (quantized). In conjunction -with compression='zlib' this produces 'lossy', but significantly more +with zlib=True this produces 'lossy', but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using numpy.around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is @@ -2153,9 +2145,9 @@

In-memory (diskless) Datasets

in unpacked data that is a reliable value." Default is None, or no quantization, or 'lossless' compression. If significant_digits=3 then the data will be quantized so that three significant digits are retained, independent -of the floating point exponent. The keyword argument quantize_mode controls -the quantization algorithm (default 'BitGroom'). The alternate 'GranularBitRound' -algorithm may result in better compression for typical geophysical datasets. +of the floating point exponent. If significant_digits is given as a negative +number, then an alternate algorithm for quantization ('granular bitgrooming') is used +that may result in better compression for typical geophysical datasets. This significant_digits kwarg is only available with netcdf-c >= 4.8.2, and only works with NETCDF4 or NETCDF4_CLASSIC formatted files.

@@ -2205,7 +2197,7 @@

In-memory (diskless) Datasets

renameVariable(unknown):
- +

renameVariable(self, oldname, newname)

rename a Variable named oldname to newname

@@ -2221,7 +2213,7 @@

In-memory (diskless) Datasets

createGroup(unknown):
- +

createGroup(self, groupname)

Creates a new Group with the given groupname.

@@ -2247,7 +2239,7 @@

In-memory (diskless) Datasets

ncattrs(unknown):
- +

ncattrs(self)

return netCDF global attribute names for this Dataset or Group in a list.

@@ -2263,7 +2255,7 @@

In-memory (diskless) Datasets

setncattr(unknown):
- +

setncattr(self,name,value)

set a netCDF dataset or group attribute using name,value pair. @@ -2281,7 +2273,7 @@

In-memory (diskless) Datasets

setncattr_string(unknown):
- +

setncattr_string(self,name,value)

set a netCDF dataset or group string attribute using name,value pair. @@ -2299,7 +2291,7 @@

In-memory (diskless) Datasets

setncatts(unknown):
- +

setncatts(self,attdict)

set a bunch of netCDF dataset or group attributes at once using a python dictionary. @@ -2318,7 +2310,7 @@

In-memory (diskless) Datasets

getncattr(unknown):
- +

getncattr(self,name)

retrieve a netCDF dataset or group attribute. @@ -2339,7 +2331,7 @@

In-memory (diskless) Datasets

delncattr(unknown):
- +

delncattr(self,name,value)

delete a netCDF dataset or group attribute. Use if you need to delete a @@ -2357,7 +2349,7 @@

In-memory (diskless) Datasets

renameAttribute(unknown):
- +

renameAttribute(self, oldname, newname)

rename a Dataset or Group attribute named oldname to newname.

@@ -2373,7 +2365,7 @@

In-memory (diskless) Datasets

renameGroup(unknown):
- +

renameGroup(self, oldname, newname)

rename a Group named oldname to newname (requires netcdf >= 4.3.1).

@@ -2389,7 +2381,7 @@

In-memory (diskless) Datasets

set_auto_chartostring(unknown):
- +

set_auto_chartostring(self, True_or_False)

Call Variable.set_auto_chartostring for all variables contained in this Dataset or @@ -2414,7 +2406,7 @@

In-memory (diskless) Datasets

set_auto_maskandscale(unknown):
- +

set_auto_maskandscale(self, True_or_False)

Call Variable.set_auto_maskandscale for all variables contained in this Dataset or @@ -2437,7 +2429,7 @@

In-memory (diskless) Datasets

set_auto_mask(unknown):
- +

set_auto_mask(self, True_or_False)

Call Variable.set_auto_mask for all variables contained in this Dataset or @@ -2461,7 +2453,7 @@

In-memory (diskless) Datasets

set_auto_scale(unknown):
- +

set_auto_scale(self, True_or_False)

Call Variable.set_auto_scale for all variables contained in this Dataset or @@ -2484,7 +2476,7 @@

In-memory (diskless) Datasets

set_always_mask(unknown):
- +

set_always_mask(self, True_or_False)

Call Variable.set_always_mask for all variables contained in @@ -2512,7 +2504,7 @@

In-memory (diskless) Datasets

set_ncstring_attrs(unknown):
- +

set_ncstring_attrs(self, True_or_False)

Call Variable.set_ncstring_attrs for all variables contained in @@ -2537,7 +2529,7 @@

In-memory (diskless) Datasets

get_variables_by_attributes(unknown):
- +

get_variables_by_attribute(self, **kwargs)

Returns a list of variables that match specific conditions.

@@ -2545,7 +2537,7 @@

In-memory (diskless) Datasets

Can pass in key=value parameters and variables are returned that contain all of the matches. For example,

-
>>> # Get variables with x-axis attribute.
+
>>> # Get variables with x-axis attribute.
 >>> vs = nc.get_variables_by_attributes(axis='X')
 >>> # Get variables with matching "standard_name" attribute
 >>> vs = nc.get_variables_by_attributes(standard_name='northward_sea_water_velocity')
@@ -2556,7 +2548,7 @@ 

In-memory (diskless) Datasets

the attribute value. None is given as the attribute value when the attribute does not exist on the variable. For example,

-
>>> # Get Axis variables
+
>>> # Get Axis variables
 >>> vs = nc.get_variables_by_attributes(axis=lambda v: v in ['X', 'Y', 'Z', 'T'])
 >>> # Get variables that don't have an "axis" attribute
 >>> vs = nc.get_variables_by_attributes(axis=lambda v: v is None)
@@ -2575,7 +2567,7 @@ 

In-memory (diskless) Datasets

fromcdl(unknown):
- +

fromcdl(cdlfilename, ncfilename=None, mode='a',format='NETCDF4')

call ncgen via subprocess to create Dataset from CDL @@ -2605,7 +2597,7 @@

In-memory (diskless) Datasets

tocdl(unknown):
- +

tocdl(self, coordvars=False, data=False, outfile=None)

call ncdump via subprocess to create CDL @@ -2624,10 +2616,9 @@

In-memory (diskless) Datasets

#   - name + name = <attribute 'name' of 'netCDF4._netCDF4.Dataset' objects>
-

string name of Group instance

@@ -2636,121 +2627,109 @@

In-memory (diskless) Datasets

#   - groups + groups = <attribute 'groups' of 'netCDF4._netCDF4.Dataset' objects>
-
#   - dimensions + dimensions = <attribute 'dimensions' of 'netCDF4._netCDF4.Dataset' objects>
-
#   - variables + variables = <attribute 'variables' of 'netCDF4._netCDF4.Dataset' objects>
-
#   - disk_format + disk_format = <attribute 'disk_format' of 'netCDF4._netCDF4.Dataset' objects>
-
#   - path + path = <attribute 'path' of 'netCDF4._netCDF4.Dataset' objects>
-
#   - parent + parent = <attribute 'parent' of 'netCDF4._netCDF4.Dataset' objects>
-
#   - file_format + file_format = <attribute 'file_format' of 'netCDF4._netCDF4.Dataset' objects>
-
#   - data_model + data_model = <attribute 'data_model' of 'netCDF4._netCDF4.Dataset' objects>
-
#   - cmptypes + cmptypes = <attribute 'cmptypes' of 'netCDF4._netCDF4.Dataset' objects>
-
#   - vltypes + vltypes = <attribute 'vltypes' of 'netCDF4._netCDF4.Dataset' objects>
-
#   - enumtypes + enumtypes = <attribute 'enumtypes' of 'netCDF4._netCDF4.Dataset' objects>
-
#   - keepweakref + keepweakref = <attribute 'keepweakref' of 'netCDF4._netCDF4.Dataset' objects>
-

@@ -2763,7 +2742,7 @@

In-memory (diskless) Datasets

Variable: - +

A netCDF Variable is used to read and write netCDF data. They are analogous to numpy array objects. See Variable.__init__ for more details.

@@ -2809,20 +2788,16 @@

In-memory (diskless) Datasets

truncated to this decimal place when it is assigned to the Variable instance. If None, the data is not truncated.

-

significant_digits: New in version 1.6.0. Describes the number of significant +

significant_digits: New in version 1.6.0. Describes the number of significant digits in the data the contains a reliable value. Data is truncated to retain this number of significant digits when it is assigned to the Variable instance. If None, the data is not truncated. +If specified as a negative number, an alternative quantization algorithm is used +that often produces better compression. Only available with netcdf-c >= 4.8.2, and only works with NETCDF4 or NETCDF4_CLASSIC formatted files. The number of significant digits used in the quantization of variable data can be -obtained using the Variable.significant_digits method. Default None - -no quantization done.

- -

quantize_mode: New in version 1.6.0. Controls -the quantization algorithm (default 'BitGroom'). The alternate 'GranularBitRound' -algorithm may result in better compression for typical geophysical datasets. -Ignored if significant_digts not specified.

+obtained using the Variable.significant_digits method.

__orthogonal_indexing__: Always True. Indicates to client code that the object supports 'orthogonal indexing', which means that slices @@ -2845,8 +2820,8 @@

In-memory (diskless) Datasets

Variable()
- -

__init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, + +

__init__(self, group, name, datatype, dimensions=(), zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None,chunk_cache=None)

@@ -2880,19 +2855,15 @@

In-memory (diskless) Datasets

(defined previously with createDimension). Default is an empty tuple which means the variable is a scalar (and therefore has no dimensions).

-

compression: compression algorithm to use. Default None. Currently -only 'zlib' is supported.

-

zlib: if True, data assigned to the Variable -instance is compressed on disk. Default False. Deprecated - use -compression='zlib' instead.

+instance is compressed on disk. Default False.

-

complevel: the level of compression to use (1 is the fastest, +

complevel: the level of zlib compression to use (1 is the fastest, but poorest compression, 9 is the slowest but best compression). Default 4. -Ignored if compression=None. A value of 0 disables compression.

+Ignored if zlib=False.

shuffle: if True, the HDF5 shuffle filter is applied -to improve compression. Default True. Ignored if compression=None.

+to improve compression. Default True. Ignored if zlib=False.

fletcher32: if True (default False), the Fletcher32 checksum algorithm is used for error detection.

@@ -2922,33 +2893,30 @@

In-memory (diskless) Datasets

some performance advantage to be gained by setting the endian-ness. For netCDF 3 files (that don't use HDF5), only endian='native' is allowed.

-

The compression, zlib, complevel, shuffle, fletcher32, contiguous and chunksizes +

The zlib, complevel, shuffle, fletcher32, contiguous and chunksizes keywords are silently ignored for netCDF 3 files that do not use HDF5.

-

least_significant_digit: If this or significant_digits are specified, +

least_significant_digit: If this or significant_digits are specified, variable data will be truncated (quantized).
-In conjunction with compression='zlib' this produces +In conjunction with zlib=True this produces 'lossy', but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using around(scaledata)/scale, where scale = 2*bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). Default is None, or no quantization.

-

significant_digits: New in version 1.6.0. +

significant_digits: New in version 1.6.0. As described for least_significant_digit except the number of significant digits retained is prescribed independent -of the floating point exponent. Default None - no quantization done.

- -

quantize_mode: New in version 1.6.0. Controls -the quantization algorithm (default 'BitGroom'). The alternate 'GranularBitRound' -algorithm may result in better compression for typical geophysical datasets. -Ignored if significant_digts not specified.

+of the floating point exponent. If specified as a negative number, +an alternative quantization algorithm is used that often produces +better compression. Only available with netcdf-c >= 4.8.2.

fill_value: If specified, the default netCDF _FillValue (the value that the variable gets filled with before any data is written to it) is replaced with this value. If fill_value is set to False, then the variable is not pre-filled. The default netCDF fill values can be found -in the dictionary netCDF4.default_fillvals.

+in the dictionary netCDF4.default_fillvals.

chunk_cache: If specified, sets the chunk cache size for this variable. Persists as long as Dataset is open. Use set_var_chunk_cache to @@ -2969,7 +2937,7 @@

In-memory (diskless) Datasets

group(unknown):
- +

group(self)

return the group that this Variable is a member of.

@@ -2985,7 +2953,7 @@

In-memory (diskless) Datasets

ncattrs(unknown):
- +

ncattrs(self)

return netCDF attribute names for this Variable in a list.

@@ -3001,7 +2969,7 @@

In-memory (diskless) Datasets

setncattr(unknown):
- +

setncattr(self,name,value)

set a netCDF variable attribute using name,value pair. Use if you need to set a @@ -3019,7 +2987,7 @@

In-memory (diskless) Datasets

setncattr_string(unknown):
- +

setncattr_string(self,name,value)

set a netCDF variable string attribute using name,value pair. @@ -3038,7 +3006,7 @@

In-memory (diskless) Datasets

setncatts(unknown):
- +

setncatts(self,attdict)

set a bunch of netCDF variable attributes at once using a python dictionary. @@ -3057,7 +3025,7 @@

In-memory (diskless) Datasets

getncattr(unknown):
- +

getncattr(self,name)

retrieve a netCDF variable attribute. Use if you need to set a @@ -3078,7 +3046,7 @@

In-memory (diskless) Datasets

delncattr(unknown):
- +

delncattr(self,name,value)

delete a netCDF variable attribute. Use if you need to delete a @@ -3096,7 +3064,7 @@

In-memory (diskless) Datasets

filters(unknown):
- +

filters(self)

return dictionary containing HDF5 filter parameters.

@@ -3104,19 +3072,20 @@

In-memory (diskless) Datasets

-
-
#   +
+
#   def - quantization(unknown): + significant_digits(unknown):
- -

quantization(self)

+ +

significant_digits(self)

-

return number of significant digits and the algorithm used in quantization. -Returns None if quantization not active.

+

return number of significant digits used in quantization. +if returned value is negative, alternate quantization method +('granular bitgrooming') is used.

@@ -3129,7 +3098,7 @@

In-memory (diskless) Datasets

endian(unknown):
- +

endian(self)

return endian-ness (little,big,native) of variable (as stored in HDF5 file).

@@ -3145,7 +3114,7 @@

In-memory (diskless) Datasets

chunking(unknown):
- +

chunking(self)

return variable chunking information. If the dataset is @@ -3164,7 +3133,7 @@

In-memory (diskless) Datasets

get_var_chunk_cache(unknown):
- +

get_var_chunk_cache(self)

return variable chunk cache information in a tuple (size,nelems,preemption). @@ -3182,7 +3151,7 @@

In-memory (diskless) Datasets

set_var_chunk_cache(unknown):
- +

set_var_chunk_cache(self,size=None,nelems=None,preemption=None)

change variable chunk cache settings. @@ -3200,7 +3169,7 @@

In-memory (diskless) Datasets

renameAttribute(unknown):
- +

renameAttribute(self, oldname, newname)

rename a Variable attribute named oldname to newname.

@@ -3216,7 +3185,7 @@

In-memory (diskless) Datasets

assignValue(unknown):
- +

assignValue(self, val)

assign a value to a scalar variable. Provided for compatibility with @@ -3233,7 +3202,7 @@

In-memory (diskless) Datasets

getValue(unknown):
- +

getValue(self)

get the value of a scalar variable. Provided for compatibility with @@ -3250,7 +3219,7 @@

In-memory (diskless) Datasets

set_auto_chartostring(unknown):
- +

set_auto_chartostring(self,chartostring)

turn on or off automatic conversion of character variable data to and @@ -3281,7 +3250,7 @@

In-memory (diskless) Datasets

use_nc_get_vars(unknown):
- +

use_nc_get_vars(self,_use_get_vars)

enable the use of netcdf library routine nc_get_vars @@ -3301,7 +3270,7 @@

In-memory (diskless) Datasets

set_auto_maskandscale(unknown):
- +

set_auto_maskandscale(self,maskandscale)

turn on or off automatic conversion of variable data to and @@ -3365,7 +3334,7 @@

In-memory (diskless) Datasets

set_auto_scale(unknown):
- +

set_auto_scale(self,scale)

turn on or off automatic packing/unpacking of variable @@ -3414,7 +3383,7 @@

In-memory (diskless) Datasets

set_auto_mask(unknown):
- +

set_auto_mask(self,mask)

turn on or off automatic conversion of variable data to and @@ -3449,7 +3418,7 @@

In-memory (diskless) Datasets

set_always_mask(unknown):
- +

set_always_mask(self,always_mask)

turn on or off conversion of data without missing values to regular @@ -3472,7 +3441,7 @@

In-memory (diskless) Datasets

set_ncstring_attrs(unknown):
- +

set_always_mask(self,ncstring_attrs)

turn on or off creating NC_STRING string attributes.

@@ -3494,7 +3463,7 @@

In-memory (diskless) Datasets

set_collective(unknown):
- +

set_collective(self,True_or_False)

turn on or off collective parallel IO access. Ignored if file is not @@ -3511,7 +3480,7 @@

In-memory (diskless) Datasets

get_dims(unknown):
- +

get_dims(self)

return a tuple of Dimension instances associated with this @@ -3523,10 +3492,9 @@

In-memory (diskless) Datasets

#   - name + name = <attribute 'name' of 'netCDF4._netCDF4.Variable' objects>
-

string name of Variable instance

@@ -3535,10 +3503,9 @@

In-memory (diskless) Datasets

#   - datatype + datatype = <attribute 'datatype' of 'netCDF4._netCDF4.Variable' objects>
-

numpy data type (for primitive data types) or VLType/CompoundType/EnumType instance (for compound, vlen or enum data types)

@@ -3549,10 +3516,9 @@

In-memory (diskless) Datasets

#   - shape + shape = <attribute 'shape' of 'netCDF4._netCDF4.Variable' objects>
-

find current sizes of all variable dimensions

@@ -3561,10 +3527,9 @@

In-memory (diskless) Datasets

#   - size + size = <attribute 'size' of 'netCDF4._netCDF4.Variable' objects>
-

Return the number of stored elements.

@@ -3573,10 +3538,9 @@

In-memory (diskless) Datasets

#   - dimensions + dimensions = <attribute 'dimensions' of 'netCDF4._netCDF4.Variable' objects>
-

get variables's dimension names

@@ -3585,61 +3549,55 @@

In-memory (diskless) Datasets

#   - ndim + ndim = <attribute 'ndim' of 'netCDF4._netCDF4.Variable' objects>
-
#   - dtype + dtype = <attribute 'dtype' of 'netCDF4._netCDF4.Variable' objects>
-
#   - mask + mask = <attribute 'mask' of 'netCDF4._netCDF4.Variable' objects>
-
#   - scale + scale = <attribute 'scale' of 'netCDF4._netCDF4.Variable' objects>
-
#   - always_mask + always_mask = <attribute 'always_mask' of 'netCDF4._netCDF4.Variable' objects>
-
#   - chartostring + chartostring = <attribute 'chartostring' of 'netCDF4._netCDF4.Variable' objects>
-
@@ -3652,7 +3610,7 @@

In-memory (diskless) Datasets

Dimension:
- +

A netCDF Dimension is used to describe the coordinates of a Variable. See Dimension.__init__ for more details.

@@ -3678,7 +3636,7 @@

In-memory (diskless) Datasets

Dimension()
- +

__init__(self, group, name, size=None)

Dimension constructor.

@@ -3704,7 +3662,7 @@

In-memory (diskless) Datasets

group(unknown):
- +

group(self)

return the group that this Dimension is a member of.

@@ -3720,7 +3678,7 @@

In-memory (diskless) Datasets

isunlimited(unknown):
- +

isunlimited(self)

returns True if the Dimension instance is unlimited, False otherwise.

@@ -3731,10 +3689,9 @@

In-memory (diskless) Datasets

#   - name + name = <attribute 'name' of 'netCDF4._netCDF4.Dimension' objects>
-

string name of Dimension instance

@@ -3743,10 +3700,9 @@

In-memory (diskless) Datasets

#   - size + size = <attribute 'size' of 'netCDF4._netCDF4.Dimension' objects>
-

current size of Dimension (calls len on Dimension instance)

@@ -3762,7 +3718,7 @@

In-memory (diskless) Datasets

Group(netCDF4.Dataset):
- +

Groups define a hierarchical namespace within a netCDF file. They are analogous to directories in a unix filesystem. Each Group behaves like a Dataset within a Dataset, and can contain it's own variables, @@ -3786,7 +3742,7 @@

In-memory (diskless) Datasets

Group()
- +

__init__(self, parent, name) Group constructor.

@@ -3810,7 +3766,7 @@

In-memory (diskless) Datasets

close(unknown):
- +

close(self)

overrides Dataset close method which does not apply to Group @@ -3880,7 +3836,7 @@

Inherited Members
MFDataset(netCDF4.Dataset):
- +

Class for reading multi-file netCDF Datasets, making variables spanning multiple files appear as if they were in one file. Datasets must be in NETCDF4_CLASSIC, NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET @@ -3890,7 +3846,7 @@

Inherited Members

Example usage (See MFDataset.__init__ for more details):

-
>>> import numpy as np
+
>>> import numpy as np
 >>> # create a series of netCDF files with a variable sharing
 >>> # the same unlimited dimension.
 >>> for nf in range(10):
@@ -3917,7 +3873,7 @@ 
Inherited Members
MFDataset(files, check=False, aggdim=None, exclude=[], master_file=None)
- +

__init__(self, files, check=False, aggdim=None, exclude=[], master_file=None)

@@ -3962,7 +3918,7 @@
Inherited Members
ncattrs(self):
- +

ncattrs(self)

return the netcdf attribute names from the master file.

@@ -3978,7 +3934,7 @@
Inherited Members
close(self):
- +

close(self)

close all the open files.

@@ -4046,13 +4002,13 @@
Inherited Members
MFTime(netCDF4._netCDF4._Variable):
- +

Class providing an interface to a MFDataset time Variable by imposing a unique common time unit and/or calendar to all files.

Example usage (See MFTime.__init__ for more details):

-
>>> import numpy as np
+
>>> import numpy as np
 >>> f1 = Dataset("mftest_1.nc","w", format="NETCDF4_CLASSIC")
 >>> f2 = Dataset("mftest_2.nc","w", format="NETCDF4_CLASSIC")
 >>> f1.createDimension("time",None)
@@ -4088,7 +4044,7 @@ 
Inherited Members
MFTime(time, units=None, calendar=None)
- +

__init__(self, time, units=None, calendar=None)

Create a time Variable with units consistent across a multifile @@ -4132,7 +4088,7 @@

Inherited Members
CompoundType:
- +

A CompoundType instance is used to describe a compound data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. @@ -4151,7 +4107,7 @@

Inherited Members
CompoundType()
- +

__init__(group, datatype, datatype_name)

CompoundType constructor.

@@ -4180,31 +4136,28 @@
Inherited Members
#   - dtype + dtype = <attribute 'dtype' of 'netCDF4._netCDF4.CompoundType' objects>
-
#   - dtype_view + dtype_view = <attribute 'dtype_view' of 'netCDF4._netCDF4.CompoundType' objects>
-
#   - name + name = <attribute 'name' of 'netCDF4._netCDF4.CompoundType' objects>
-
@@ -4217,7 +4170,7 @@
Inherited Members
VLType:
- +

A VLType instance is used to describe a variable length (VLEN) data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. See @@ -4235,7 +4188,7 @@

Inherited Members
VLType()
- +

__init__(group, datatype, datatype_name)

VLType constructor.

@@ -4258,21 +4211,19 @@
Inherited Members
#   - dtype + dtype = <attribute 'dtype' of 'netCDF4._netCDF4.VLType' objects>
-
#   - name + name = <attribute 'name' of 'netCDF4._netCDF4.VLType' objects>
-
@@ -4284,7 +4235,7 @@
Inherited Members
date2num(unknown):
- +

date2num(dates, units, calendar=None, has_year_zero=None)

Return numeric time values given datetime objects. The units @@ -4344,7 +4295,7 @@

Inherited Members
num2date(unknown):
- +

num2date(times, units, calendar=u'standard', only_use_cftime_datetimes=True, only_use_python_datetimes=False, has_year_zero=None)

Return datetime objects given numeric time values. The units @@ -4416,7 +4367,7 @@

Inherited Members
date2index(unknown):
- +

date2index(dates, nctime, calendar=None, select=u'exact', has_year_zero=None)

Return indices of a netCDF time variable corresponding to the given dates.

@@ -4470,7 +4421,7 @@
Inherited Members
stringtochar(unknown):
- +

stringtochar(a,encoding='utf-8')

convert a string array to a character array with one extra dimension

@@ -4497,7 +4448,7 @@
Inherited Members
chartostring(unknown):
- +

chartostring(b,encoding='utf-8')

convert a character array to a string array with one less dimension.

@@ -4524,7 +4475,7 @@
Inherited Members
stringtoarr(unknown):
- +

stringtoarr(a, NUMCHARS,dtype='S')

convert a string to a character array of length NUMCHARS

@@ -4552,7 +4503,7 @@
Inherited Members
getlibversion(unknown):
- +

getlibversion()

returns a string describing the version of the netcdf library @@ -4570,7 +4521,7 @@

Inherited Members
EnumType:
- +

A EnumType instance is used to describe an Enum data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. See @@ -4588,7 +4539,7 @@

Inherited Members
EnumType()
- +

__init__(group, datatype, datatype_name, enum_dict)

EnumType constructor.

@@ -4614,31 +4565,28 @@
Inherited Members
#   - dtype + dtype = <attribute 'dtype' of 'netCDF4._netCDF4.EnumType' objects>
-
#   - name + name = <attribute 'name' of 'netCDF4._netCDF4.EnumType' objects>
-
#   - enum_dict + enum_dict = <attribute 'enum_dict' of 'netCDF4._netCDF4.EnumType' objects>
-
@@ -4650,7 +4598,7 @@
Inherited Members
get_chunk_cache(unknown):
- +

get_chunk_cache()

return current netCDF chunk cache information in a tuple (size,nelems,preemption). @@ -4668,7 +4616,7 @@

Inherited Members
set_chunk_cache(unknown):
- +

set_chunk_cache(self,size=None,nelems=None,preemption=None)

change netCDF4 chunk cache settings. From 7705c1ebaa6a868e31fa0b5991560349c2c10355 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 3 Apr 2022 20:51:34 -0600 Subject: [PATCH 05/92] update --- docs/index.html | 114 +++++++++++++++++++++++++++++------------------- 1 file changed, 68 insertions(+), 46 deletions(-) diff --git a/docs/index.html b/docs/index.html index 8213f510f..3d402ea44 100644 --- a/docs/index.html +++ b/docs/index.html @@ -223,7 +223,7 @@

API Documentation

filters
  • - significant_digits + quantization
  • endian @@ -474,7 +474,7 @@

    Introduction

    and should be familiar to users of that module.

    Most new features of netCDF 4 are implemented, such as multiple -unlimited dimensions, groups and zlib data compression. All the new +unlimited dimensions, groups and data compression. All the new numeric data types (such as 64 bit and unsigned integer types) are implemented. Compound (struct), variable length (vlen) and enumerated (enum) data types are supported, but not the opaque data type. @@ -1078,9 +1078,9 @@

    Efficient compression of netC

    Data stored in netCDF 4 Variable objects can be compressed and decompressed on the fly. The parameters for the compression are -determined by the zlib, complevel and shuffle keyword arguments +determined by the compression, complevel and shuffle keyword arguments to the Dataset.createVariable method. To turn on -compression, set zlib=True. The complevel keyword regulates the +compression, set compression=zlib. The complevel keyword regulates the speed and efficiency of the compression (1 being fastest, but lowest compression ratio, 9 being slowest but best compression ratio). The default value of complevel is 4. Setting shuffle=False will turn @@ -1100,7 +1100,7 @@

    Efficient compression of netC

    If your data only has a certain number of digits of precision (say for example, it is temperature data that was measured with a precision of -0.1 degrees), you can dramatically improve zlib compression by +0.1 degrees), you can dramatically improve compression by quantizing (or truncating) the data. There are two methods supplied for doing this. You can use the least_significant_digit keyword argument to Dataset.createVariable to specify @@ -1110,7 +1110,7 @@

    Efficient compression of netC data the data to be quantized using numpy.around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). This is done at the python level and is -not a part of the underlying C library. Starting with netcdf-c version 4.8.2, +not a part of the underlying C library. Starting with netcdf-c version 4.9.0, a quantization capability is provided in the library. This can be used via the significant_digits Dataset.createVariable kwarg (new in version 1.6.0). @@ -1128,17 +1128,17 @@

    Efficient compression of netC

    with

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),zlib=True)
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib')
     

    and then

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),zlib=True,least_significant_digit=3)
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',least_significant_digit=3)
     
    -

    or with netcdf-c >= 4.8.2

    +

    or with netcdf-c >= 4.9.0

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),zlib=True,significant_digits=4)
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',significant_digits=4)
     

    and see how much smaller the resulting files are.

    @@ -1612,7 +1612,7 @@

    In-memory (diskless) Datasets

    the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

    -

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    +

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    copyright: 2008 by Jeffrey Whitaker.

    @@ -2053,9 +2053,10 @@

    In-memory (diskless) Datasets

    -

    createVariable(self, varname, datatype, dimensions=(), zlib=False, +

    createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, -endian='native', least_significant_digit=None, significant_digits=None, fill_value=None, chunk_cache=None)

    +endian='native', least_significant_digit=None, significant_digits=None, quantize_mode='BitGroom', +fill_value=None, chunk_cache=None)

    Creates a new variable with the given varname, datatype, and dimensions. If dimensions are not given, the variable is assumed to be @@ -2087,11 +2088,17 @@

    In-memory (diskless) Datasets

    previously using Dataset.createDimension. The default value is an empty tuple, which means the variable is a scalar.

    +

    If the optional keyword argument compression is set, the data will be +compressed in the netCDF file using the specified compression algorithm. +Currently only 'zlib' is supported. Default is None (no compression).

    +

    If the optional keyword zlib is True, the data will be compressed in -the netCDF file using gzip compression (default False).

    +the netCDF file using zlib compression (default False). The use of this option is +deprecated in favor of compression='zlib'.

    -

    The optional keyword complevel is an integer between 1 and 9 describing -the level of compression desired (default 4). Ignored if zlib=False.

    +

    The optional keyword complevel is an integer between 0 and 9 describing +the level of compression desired (default 4). Ignored if compression=None. +A value of zero disables compression.

    If the optional keyword shuffle is True, the HDF5 shuffle filter will be applied before compressing the data (default True). This @@ -2125,7 +2132,7 @@

    In-memory (diskless) Datasets

    opposite format as the one used to create the file, there may be some performance advantage to be gained by setting the endian-ness.

    -

    The zlib, complevel, shuffle, fletcher32, contiguous, chunksizes and endian +

    The compression, zlib, complevel, shuffle, fletcher32, contiguous, chunksizes and endian keywords are silently ignored for netCDF 3 files that do not use HDF5.

    The optional keyword fill_value can be used to override the default @@ -2135,7 +2142,7 @@

    In-memory (diskless) Datasets

    If the optional keyword parameters least_significant_digit or significant_digits are specified, variable data will be truncated (quantized). In conjunction -with zlib=True this produces 'lossy', but significantly more +with compression='zlib' this produces 'lossy', but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using numpy.around(scale*data)/scale, where scale = 2**bits, and bits is determined so that a precision of 0.1 is @@ -2145,10 +2152,11 @@

    In-memory (diskless) Datasets

    in unpacked data that is a reliable value." Default is None, or no quantization, or 'lossless' compression. If significant_digits=3 then the data will be quantized so that three significant digits are retained, independent -of the floating point exponent. If significant_digits is given as a negative -number, then an alternate algorithm for quantization ('granular bitgrooming') is used -that may result in better compression for typical geophysical datasets. -This significant_digits kwarg is only available with netcdf-c >= 4.8.2, and +of the floating point exponent. The keyword argument quantize_mode controls +the quantization algorithm (default 'BitGroom', 'BitRound' and +'GranularBitRound' also available). The 'GranularBitRound' +algorithm may result in better compression for typical geophysical datasets. +This significant_digits kwarg is only available with netcdf-c >= 4.9.0, and only works with NETCDF4 or NETCDF4_CLASSIC formatted files.

    When creating variables in a NETCDF4 or NETCDF4_CLASSIC formatted file, @@ -2788,16 +2796,22 @@

    In-memory (diskless) Datasets

    truncated to this decimal place when it is assigned to the Variable instance. If None, the data is not truncated.

    -

    significant_digits: New in version 1.6.0. Describes the number of significant +

    significant_digits: New in version 1.6.0. Describes the number of significant digits in the data the contains a reliable value. Data is truncated to retain this number of significant digits when it is assigned to the Variable instance. If None, the data is not truncated. -If specified as a negative number, an alternative quantization algorithm is used -that often produces better compression. -Only available with netcdf-c >= 4.8.2, +Only available with netcdf-c >= 4.9.0, and only works with NETCDF4 or NETCDF4_CLASSIC formatted files. The number of significant digits used in the quantization of variable data can be -obtained using the Variable.significant_digits method.

    +obtained using the Variable.significant_digits method. Default None - +no quantization done.

    + +

    quantize_mode: New in version 1.6.0. Controls +the quantization algorithm (default 'BitGroom', 'BitRound' and +'GranularBitRound' also available). The 'GranularBitRound' +algorithm may result in better compression for typical geophysical datasets. +Ignored if significant_digits not specified. If 'BitRound' is used, then +significant_digits is interpreted as binary (not decimal) digits.

    __orthogonal_indexing__: Always True. Indicates to client code that the object supports 'orthogonal indexing', which means that slices @@ -2821,7 +2835,7 @@

    In-memory (diskless) Datasets

    -

    __init__(self, group, name, datatype, dimensions=(), zlib=False, +

    __init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None,chunk_cache=None)

    @@ -2855,15 +2869,19 @@

    In-memory (diskless) Datasets

    (defined previously with createDimension). Default is an empty tuple which means the variable is a scalar (and therefore has no dimensions).

    +

    compression: compression algorithm to use. Default None. Currently +only 'zlib' is supported.

    +

    zlib: if True, data assigned to the Variable -instance is compressed on disk. Default False.

    +instance is compressed on disk. Default False. Deprecated - use +compression='zlib' instead.

    -

    complevel: the level of zlib compression to use (1 is the fastest, +

    complevel: the level of compression to use (1 is the fastest, but poorest compression, 9 is the slowest but best compression). Default 4. -Ignored if zlib=False.

    +Ignored if compression=None. A value of 0 disables compression.

    shuffle: if True, the HDF5 shuffle filter is applied -to improve compression. Default True. Ignored if zlib=False.

    +to improve compression. Default True. Ignored if compression=None.

    fletcher32: if True (default False), the Fletcher32 checksum algorithm is used for error detection.

    @@ -2893,24 +2911,29 @@

    In-memory (diskless) Datasets

    some performance advantage to be gained by setting the endian-ness. For netCDF 3 files (that don't use HDF5), only endian='native' is allowed.

    -

    The zlib, complevel, shuffle, fletcher32, contiguous and chunksizes +

    The compression, zlib, complevel, shuffle, fletcher32, contiguous and chunksizes keywords are silently ignored for netCDF 3 files that do not use HDF5.

    -

    least_significant_digit: If this or significant_digits are specified, +

    least_significant_digit: If this or significant_digits are specified, variable data will be truncated (quantized).
    -In conjunction with zlib=True this produces +In conjunction with compression='zlib' this produces 'lossy', but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using around(scaledata)/scale, where scale = 2*bits, and bits is determined so that a precision of 0.1 is retained (in this case bits=4). Default is None, or no quantization.

    -

    significant_digits: New in version 1.6.0. +

    significant_digits: New in version 1.6.0. As described for least_significant_digit except the number of significant digits retained is prescribed independent -of the floating point exponent. If specified as a negative number, -an alternative quantization algorithm is used that often produces -better compression. Only available with netcdf-c >= 4.8.2.

    +of the floating point exponent. Default None - no quantization done.

    + +

    quantize_mode: New in version 1.6.0. Controls +the quantization algorithm (default 'BitGroom', 'BitRound' and +'GranularBitRound' also available). The 'GranularBitRound' +algorithm may result in better compression for typical geophysical datasets. +Ignored if significant_digts not specified. If 'BitRound' is used, then +significant_digits is interpreted as binary (not decimal) digits.

    fill_value: If specified, the default netCDF _FillValue (the value that the variable gets filled with before any data is written to it) @@ -3072,20 +3095,19 @@

    In-memory (diskless) Datasets

    -
    -
    #   +
    +
    #   def - significant_digits(unknown): + quantization(unknown):
    -

    significant_digits(self)

    +

    quantization(self)

    -

    return number of significant digits used in quantization. -if returned value is negative, alternate quantization method -('granular bitgrooming') is used.

    +

    return number of significant digits and the algorithm used in quantization. +Returns None if quantization not active.

    From 1e8101d7c5e704e38df752dd0e00fd65e70aa9e9 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Mon, 4 Apr 2022 08:20:51 -0600 Subject: [PATCH 06/92] remove -oversubscripe option (conda installs mpich now) --- .github/workflows/miniconda.yml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.github/workflows/miniconda.yml b/.github/workflows/miniconda.yml index 23e7e1fc4..2d27d3168 100644 --- a/.github/workflows/miniconda.yml +++ b/.github/workflows/miniconda.yml @@ -78,7 +78,8 @@ jobs: export PATH="${CONDA_PREFIX}/bin:${CONDA_PREFIX}/Library/bin:$PATH" which mpirun mpirun --version - mpirun -np 4 --oversubscribe python mpi_example.py + #mpirun -np 4 --oversubscribe python mpi_example.py # for openmpi + mpirun -np 4 python mpi_example.py if [ $? -ne 0 ] ; then echo "hdf5 mpi test failed!" exit 1 From cb63f57c71db70b821c4868e91935ef2768dc251 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sat, 23 Apr 2022 22:07:55 -0600 Subject: [PATCH 07/92] install bzip2, zstandard and blosc --- .github/workflows/build_master.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 69a818ba3..f36339695 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -23,7 +23,7 @@ jobs: - name: Install Ubuntu Dependencies run: | sudo apt-get update - sudo apt-get install mpich libmpich-dev libhdf5-mpich-dev libcurl4-openssl-dev + sudo apt-get install mpich libmpich-dev libhdf5-mpich-dev libcurl4-openssl-dev bzip2 libblosc-dev zstd echo "Download and build netCDF github master" git clone https://github.com/Unidata/netcdf-c pushd netcdf-c From b0a42465f309808984ace54d8bf3dcb85c343731 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 07:54:00 -0600 Subject: [PATCH 08/92] install libzstd-dev --- .github/workflows/build_master.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index f36339695..81c43b455 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -23,7 +23,7 @@ jobs: - name: Install Ubuntu Dependencies run: | sudo apt-get update - sudo apt-get install mpich libmpich-dev libhdf5-mpich-dev libcurl4-openssl-dev bzip2 libblosc-dev zstd + sudo apt-get install mpich libmpich-dev libhdf5-mpich-dev libcurl4-openssl-dev bzip2 libblosc-dev libzstd-dev echo "Download and build netCDF github master" git clone https://github.com/Unidata/netcdf-c pushd netcdf-c From 5c6caa747dd2f95948f3ce5ed54c025d07ba2f8c Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 08:44:32 -0600 Subject: [PATCH 09/92] enable bzip2,zstd compression --- include/netCDF4.pxi | 15 +++++++++++++ setup.py | 46 +++++++++++++++++++++++++++++++++++++--- src/netCDF4/_netCDF4.pyx | 44 +++++++++++++++++++++++++++----------- 3 files changed, 90 insertions(+), 15 deletions(-) diff --git a/include/netCDF4.pxi b/include/netCDF4.pxi index d26d4991c..c743218f6 100644 --- a/include/netCDF4.pxi +++ b/include/netCDF4.pxi @@ -701,6 +701,21 @@ IF HAS_QUANTIZATION_SUPPORT: int nc_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd) int nc_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp) nogil +IF HAS_ZSTANDARD_SUPPORT: + cdef extern from "netcdf_filter.h": + int nc_def_var_zstandard(int ncid, int varid, int level) + int nc_inq_var_zstandard(int ncid, int varid, int* hasfilterp, int *levelp) + +IF HAS_BZIP2_SUPPORT: + cdef extern from "netcdf_filter.h": + int nc_def_var_bzip2(int ncid, int varid, int level) + int nc_inq_var_bzip2(int ncid, int varid, int* hasfilterp, int *levelp) + +IF HAS_BLOSC_SUPPORT: + cdef extern from "netcdf_filter.h": + int nc_def_var_blosc(int ncid, int varid, unsigned subcompressor, unsigned level, unsigned blocksize, unsigned addshuffle) + int nc_inq_var_blosc(int ncid, int varid, int* hasfilterp, unsigned* subcompressorp, unsigned* levelp, unsigned* blocksizep, unsigned* addshufflep) + IF HAS_NC_OPEN_MEM: cdef extern from "netcdf_mem.h": int nc_open_mem(const char *path, int mode, size_t size, void* memory, int *ncidp) diff --git a/setup.py b/setup.py index 76eceb455..5213af649 100644 --- a/setup.py +++ b/setup.py @@ -66,6 +66,9 @@ def check_api(inc_dirs,netcdf_lib_version): has_parallel4_support = False has_pnetcdf_support = False has_quantize = False + has_zstandard = False + has_bzip2 = False + has_blosc = False for d in inc_dirs: try: @@ -74,6 +77,7 @@ def check_api(inc_dirs,netcdf_lib_version): continue has_nc_open_mem = os.path.exists(os.path.join(d, 'netcdf_mem.h')) + has_nc_filter = os.path.exists(os.path.join(d, 'netcdf_filter.h')) for line in f: if line.startswith('nc_rename_grp'): @@ -96,6 +100,19 @@ def check_api(inc_dirs,netcdf_lib_version): if line.startswith('EXTERNL int nc_create_mem'): has_nc_create_mem = True + if has_nc_filter: + try: + f = open(os.path.join(d, 'netcdf_filter.h'), **open_kwargs) + except IOError: + continue + for line in f: + if line.startswith('EXTERNL int nc_def_var_zstandard'): + has_zstandard = True + if line.startswith('EXTERNL int nc_def_var_bzip2'): + has_bzip2 = True + if line.startswith('EXTERNL int nc_def_var_blosc'): + has_blosc = True + ncmetapath = os.path.join(d,'netcdf_meta.h') if os.path.exists(ncmetapath): for line in open(ncmetapath): @@ -119,7 +136,8 @@ def check_api(inc_dirs,netcdf_lib_version): return has_rename_grp, has_nc_inq_path, has_nc_inq_format_extended, \ has_cdf5_format, has_nc_open_mem, has_nc_create_mem, \ - has_parallel4_support, has_pnetcdf_support, has_quantize + has_parallel4_support, has_pnetcdf_support, has_quantize, \ + has_zstandard, has_bzip2, has_blosc def getnetcdfvers(libdirs): @@ -532,7 +550,8 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): # this determines whether renameGroup and filepath methods will work. has_rename_grp, has_nc_inq_path, has_nc_inq_format_extended, \ has_cdf5_format, has_nc_open_mem, has_nc_create_mem, \ - has_parallel4_support, has_pnetcdf_support, has_quantize = \ + has_parallel4_support, has_pnetcdf_support, has_quantize, \ + has_zstandard, has_bzip2, has_blosc = \ check_api(inc_dirs,netcdf_lib_version) # for netcdf 4.4.x CDF5 format is always enabled. if netcdf_lib_version is not None and\ @@ -608,9 +627,30 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): sys.stdout.write('netcdf lib has bit-grooming/quantization functions\n') f.write('DEF HAS_QUANTIZATION_SUPPORT = 1\n') else: - sys.stdout.write('netcdf lib does not bit-grooming/quantization functions\n') + sys.stdout.write('netcdf lib does not have bit-grooming/quantization functions\n') f.write('DEF HAS_QUANTIZATION_SUPPORT = 0\n') + if has_zstandard: + sys.stdout.write('netcdf lib has zstandard compression functions\n') + f.write('DEF HAS_ZSTANDARD_SUPPORT = 1\n') + else: + sys.stdout.write('netcdf lib does not have zstandard compression functions\n') + f.write('DEF HAS_ZSTANDARD_SUPPORT = 0\n') + + if has_bzip2: + sys.stdout.write('netcdf lib has bzip2 compression functions\n') + f.write('DEF HAS_BZIP2_SUPPORT = 1\n') + else: + sys.stdout.write('netcdf lib does not have bzip2 compression functions\n') + f.write('DEF HAS_BZIP2_SUPPORT = 0\n') + + if has_blosc: + sys.stdout.write('netcdf lib has blosc compression functions\n') + f.write('DEF HAS_BLOSC_SUPPORT = 1\n') + else: + sys.stdout.write('netcdf lib does not have blosc compression functions\n') + f.write('DEF HAS_BLOSC_SUPPORT = 0\n') + f.close() if has_parallel4_support or has_pnetcdf_support: diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 09f037640..032fd9417 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -2679,7 +2679,7 @@ is an empty tuple, which means the variable is a scalar. If the optional keyword argument `compression` is set, the data will be compressed in the netCDF file using the specified compression algorithm. -Currently only 'zlib' is supported. Default is `None` (no compression). +Currently 'zlib','zstd' and 'bzip2' are supported. Default is `None` (no compression). If the optional keyword `zlib` is `True`, the data will be compressed in the netCDF file using zlib compression (default `False`). The use of this option is @@ -3684,7 +3684,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. which means the variable is a scalar (and therefore has no dimensions). **`compression`**: compression algorithm to use. Default None. Currently - only 'zlib' is supported. + 'zlib','zstd' and 'bzip2' are supported. **`zlib`**: if `True`, data assigned to the `Variable` instance is compressed on disk. Default `False`. Deprecated - use @@ -3780,13 +3780,15 @@ behavior is similar to Fortran or Matlab, but different than numpy. # if complevel is set to zero, turn off compression if not complevel: compression = None - # possible future options include 'zstd' and 'bzip2', zlib = False - #zstd = False + zstd = False + bzip2 = False if compression == 'zlib': zlib = True - #elif compression == 'zstd': - # zstd = True + elif compression == 'zstd': + zstd = True + elif compression == 'bzip2': + bzip2 = True elif not compression: compression = None # if compression evaluates to False, set to None. pass @@ -3924,12 +3926,30 @@ behavior is similar to Fortran or Matlab, but different than numpy. if ierr != NC_NOERR: if grp.data_model != 'NETCDF4': grp._enddef() _ensure_nc_success(ierr) - #if zstd: - # icomplevel = complevel - # ierr = nc_def_var_zstandard(self._grpid, self._varid, icomplevel) - # if ierr != NC_NOERR: - # if grp.data_model != 'NETCDF4': grp._enddef() - # _ensure_nc_success(ierr) + if zstd: + IF HAS_ZSTANDARD_SUPPORT: + icomplevel = complevel + ierr = nc_def_var_zstandard(self._grpid, self._varid, icomplevel) + if ierr != NC_NOERR: + if grp.data_model != 'NETCDF4': grp._enddef() + _ensure_nc_success(ierr) + ELSE: + msg = """ +compression='zstd' only works with netcdf-c >= 4.9.0. To enable, install Cython, make sure you have +version 4.9.0 or higher netcdf-c with zstandard support, and rebuild netcdf4-python.""" + raise ValueError(msg) + if bzip2: + IF HAS_BZIP2_SUPPORT: + icomplevel = complevel + ierr = nc_def_var_bzip2(self._grpid, self._varid, icomplevel) + if ierr != NC_NOERR: + if grp.data_model != 'NETCDF4': grp._enddef() + _ensure_nc_success(ierr) + ELSE: + msg = """ +compression='bzip2' only works with netcdf-c >= 4.9.0. To enable, install Cython, make sure you have +version 4.9.0 or higher netcdf-c with bzip2 support, and rebuild netcdf4-python.""" + raise ValueError(msg) # set checksum. if fletcher32 and ndims: # don't bother for scalar variable ierr = nc_def_var_fletcher32(self._grpid, self._varid, 1) From dd4de90c9b7f62c4c6d072854880d44d2ddafe6d Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 08:46:31 -0600 Subject: [PATCH 10/92] update --- Changelog | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/Changelog b/Changelog index 92968f30c..d3cbac8b2 100644 --- a/Changelog +++ b/Changelog @@ -1,6 +1,6 @@ version 1.6.0 (tag v1.6.0rel) ============================== - * add support for new quantization functionality in netcdf-c 4.8.2 via "signficant_digits" + * add support for new quantization functionality in netcdf-c 4.9.0 via "signficant_digits" and "quantize_mode" kwargs in Dataset.createVariable. Default quantization_mode is "BitGroom", but alternate methods "BitRound" and GranularBitRound" also supported. * opening a Dataset in append mode (mode = 'a' or 'r+') creates a Dataset @@ -11,10 +11,12 @@ names in "dimensions" tuple kwarg (issue #1145). * remove all vestiges of python 2 in _netCDF4.pyx and set cython language_level directive to 3 in setup.py. - * add 'compression' kwarg to createVariable. Only 'None' and 'zlib' currently + * add 'compression' kwarg to createVariable to enable new compression + functionality in netcdf-c 4.9.0. 'None', 'zlib', 'zstd' and 'bzip2 are currently allowed (compression='zlib' is equivalent to zlib=True), but allows for new compression algorithms to be added when they become available - in netcdf-c. The 'zlib' kwarg is now deprecated. + in netcdf-c. The 'zlib' kwarg is now deprecated. Blosc compression feature + in netcdf-c 4.9.0 not yet supported. * MFDataset did not aggregate 'name' variable attribute (issue #1153). * issue warning instead of raising an exception if missing_value or _FillValue can't be cast to the variable type when creating a From 8732ca901489c2beaa83f14eca4d5bfbaa9f329b Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 13:01:42 -0600 Subject: [PATCH 11/92] update --- include/netCDF4.pxi | 5 +++++ src/netCDF4/__init__.py | 4 +++- src/netCDF4/_netCDF4.pyx | 2 ++ 3 files changed, 10 insertions(+), 1 deletion(-) diff --git a/include/netCDF4.pxi b/include/netCDF4.pxi index c743218f6..a5ff730be 100644 --- a/include/netCDF4.pxi +++ b/include/netCDF4.pxi @@ -703,11 +703,16 @@ IF HAS_QUANTIZATION_SUPPORT: IF HAS_ZSTANDARD_SUPPORT: cdef extern from "netcdf_filter.h": + cdef enum: + H5Z_FILTER_ZSTANDARD int nc_def_var_zstandard(int ncid, int varid, int level) int nc_inq_var_zstandard(int ncid, int varid, int* hasfilterp, int *levelp) + int nc_inq_filter_avail(int ncid, unsigned id) IF HAS_BZIP2_SUPPORT: cdef extern from "netcdf_filter.h": + cdef enum: + H5Z_FILTER_BZIP2 int nc_def_var_bzip2(int ncid, int varid, int level) int nc_inq_var_bzip2(int ncid, int varid, int* hasfilterp, int *levelp) diff --git a/src/netCDF4/__init__.py b/src/netCDF4/__init__.py index 4888e7c26..f9fe6a736 100644 --- a/src/netCDF4/__init__.py +++ b/src/netCDF4/__init__.py @@ -7,6 +7,8 @@ __has_rename_grp__, __has_nc_inq_path__, __has_nc_inq_format_extended__, __has_nc_open_mem__, __has_nc_create_mem__, __has_cdf5_format__, - __has_parallel4_support__, __has_pnetcdf_support__,__has_quantization_support__) + __has_parallel4_support__, __has_pnetcdf_support__, + __has_quantization_support__, __has_zstandard_support__, + __has_bzip2_support__) __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 032fd9417..ec2570de9 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -1321,6 +1321,8 @@ __has_nc_create_mem__ = HAS_NC_CREATE_MEM __has_parallel4_support__ = HAS_PARALLEL4_SUPPORT __has_pnetcdf_support__ = HAS_PNETCDF_SUPPORT __has_quantization_support__ = HAS_QUANTIZATION_SUPPORT +__has_zstandard_support__ = HAS_ZSTANDARD_SUPPORT +__has_bzip2_support__ = HAS_BZIP2_SUPPORT _needsworkaround_issue485 = __netcdf4libversion__ < "4.4.0" or \ (__netcdf4libversion__.startswith("4.4.0") and \ "-development" in __netcdf4libversion__) From 8506b4fad7ebdbed21c0ed3b2bafffb49df8e16f Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 13:30:59 -0600 Subject: [PATCH 12/92] update filters method --- src/netCDF4/_netCDF4.pyx | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index ec2570de9..0c9e0febc 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -4334,7 +4334,9 @@ attributes.""" return dictionary containing HDF5 filter parameters.""" cdef int ierr,ideflate,ishuffle,icomplevel,ifletcher32 - filtdict = {'compression':None,'zlib':False,'shuffle':False,'complevel':0,'fletcher32':False} + cdef int izstd=0 + cdef int ibzip2=0 + filtdict = {'zlib':False,'zstd':False,'bzip2':False,'shuffle':False,'complevel':0,'fletcher32':False} if self._grp.data_model not in ['NETCDF4_CLASSIC','NETCDF4']: return with nogil: ierr = nc_inq_var_deflate(self._grpid, self._varid, &ishuffle, &ideflate, &icomplevel) @@ -4342,10 +4344,21 @@ return dictionary containing HDF5 filter parameters.""" with nogil: ierr = nc_inq_var_fletcher32(self._grpid, self._varid, &ifletcher32) _ensure_nc_success(ierr) + IF HAS_ZSTANDARD_SUPPORT: + ierr = nc_inq_var_zstandard(self._grpid, self._varid, &izstd, &icomplevel) + _ensure_nc_success(ierr) + IF HAS_BZIP2_SUPPORT: + ierr = nc_inq_var_bzip2(self._grpid, self._varid, &ibzip2, &icomplevel) + _ensure_nc_success(ierr) if ideflate: - filtdict['compression']='zlib' filtdict['zlib']=True filtdict['complevel']=icomplevel + if izstd: + filtdict['zstd']=True + filtdict['complevel']=icomplevel + if ibzip2: + filtdict['bzip2']=True + filtdict['complevel']=icomplevel if ishuffle: filtdict['shuffle']=True if ifletcher32: From 9815b3f0c4975e1dbdac5e6221989c2ff6a852d7 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 13:31:48 -0600 Subject: [PATCH 13/92] update --- test/tst_compression.py | 17 ++++++++++------- ...compression2.py => tst_compression_quant.py} | 7 ++++--- 2 files changed, 14 insertions(+), 10 deletions(-) rename test/{tst_compression2.py => tst_compression_quant.py} (93%) diff --git a/test/tst_compression.py b/test/tst_compression.py index b39fabd7d..3aa672017 100644 --- a/test/tst_compression.py +++ b/test/tst_compression.py @@ -97,8 +97,8 @@ def runTest(self): size = os.stat(self.files[1]).st_size assert_almost_equal(array,f.variables['data'][:]) assert_almost_equal(array,f.variables['data2'][:]) - assert f.variables['data'].filters() == {'compression':'zlib','zlib':True,'shuffle':False,'complevel':6,'fletcher32':False} - assert f.variables['data2'].filters() == {'compression':'zlib','zlib':True,'shuffle':False,'complevel':6,'fletcher32':False} + assert f.variables['data'].filters() == {'zlib':True,'shuffle':False,'complevel':6,'fletcher32':False} + assert f.variables['data2'].filters() == {'zlib':True,'shuffle':False,'complevel':6,'fletcher32':False} assert(size < 0.95*uncompressed_size) f.close() # check compression with shuffle @@ -106,8 +106,8 @@ def runTest(self): size = os.stat(self.files[2]).st_size assert_almost_equal(array,f.variables['data'][:]) assert_almost_equal(array,f.variables['data2'][:]) - assert f.variables['data'].filters() == {'compression':'zlib','zlib':True,'shuffle':True,'complevel':6,'fletcher32':False} - assert f.variables['data2'].filters() == {'compression':'zlib','zlib':True,'shuffle':True,'complevel':6,'fletcher32':False} + assert f.variables['data'].filters() == {'zlib':True,'shuffle':True,'complevel':6,'fletcher32':False} + assert f.variables['data2'].filters() == {'zlib':True,'shuffle':True,'complevel':6,'fletcher32':False} assert(size < 0.85*uncompressed_size) f.close() # check lossy compression without shuffle @@ -131,8 +131,10 @@ def runTest(self): size = os.stat(self.files[5]).st_size assert_almost_equal(checkarray,f.variables['data'][:]) assert_almost_equal(checkarray,f.variables['data2'][:]) - assert f.variables['data'].filters() == {'compression':'zlib','zlib':True,'shuffle':True,'complevel':6,'fletcher32':True} - assert f.variables['data2'].filters() == {'compression':'zlib','zlib':True,'shuffle':True,'complevel':6,'fletcher32':True} + assert f.variables['data'].filters() ==\ + {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':6,'fletcher32':True} + assert f.variables['data2'].filters() ==\ + {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':6,'fletcher32':True} assert(size < 0.20*uncompressed_size) # should be slightly larger than without fletcher32 assert(size > size_save) @@ -141,7 +143,8 @@ def runTest(self): f = Dataset(self.files[6]) checkarray2 = _quantize(array2,lsd) assert_almost_equal(checkarray2,f.variables['data2'][:]) - assert f.variables['data2'].filters() == {'compression':'zlib','zlib':True,'shuffle':True,'complevel':6,'fletcher32':True} + assert f.variables['data2'].filters() ==\ + {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':6,'fletcher32':True} assert f.variables['data2'].chunking() == [chunk1,chunk2] f.close() diff --git a/test/tst_compression2.py b/test/tst_compression_quant.py similarity index 93% rename from test/tst_compression2.py rename to test/tst_compression_quant.py index a7e4929b8..3fb42c298 100644 --- a/test/tst_compression2.py +++ b/test/tst_compression_quant.py @@ -1,6 +1,5 @@ from numpy.random.mtrand import uniform from netCDF4 import Dataset -from netCDF4.utils import _quantize from numpy.testing import assert_almost_equal import numpy as np import os, tempfile, unittest @@ -59,7 +58,8 @@ def runTest(self): size = os.stat(self.files[1]).st_size #print('compressed lossless no shuffle = ',size) assert_almost_equal(data_array,f.variables['data'][:]) - assert f.variables['data'].filters() == {'compression':'zlib','zlib':True,'shuffle':False,'complevel':complevel,'fletcher32':False} + assert f.variables['data'].filters() ==\ + {'zlib':True,'zstd':False,'bzip2':False,'shuffle':False,'complevel':complevel,'fletcher32':False} assert(size < 0.95*uncompressed_size) f.close() # check compression with shuffle @@ -67,7 +67,8 @@ def runTest(self): size = os.stat(self.files[2]).st_size #print('compressed lossless with shuffle ',size) assert_almost_equal(data_array,f.variables['data'][:]) - assert f.variables['data'].filters() == {'compression':'zlib','zlib':True,'shuffle':True,'complevel':complevel,'fletcher32':False} + assert f.variables['data'].filters() ==\ + {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':complevel,'fletcher32':False} assert(size < 0.85*uncompressed_size) f.close() # check lossy compression without shuffle From b9d02bcfc29dc577347a4ed9ca683b8daa56fc54 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 13:32:15 -0600 Subject: [PATCH 14/92] tests for zstd and bzip2 filters --- test/tst_compression_bzip2.py | 53 +++++++++++++++++++++++++++++++++++ test/tst_compression_zstd.py | 53 +++++++++++++++++++++++++++++++++++ 2 files changed, 106 insertions(+) create mode 100644 test/tst_compression_bzip2.py create mode 100644 test/tst_compression_zstd.py diff --git a/test/tst_compression_bzip2.py b/test/tst_compression_bzip2.py new file mode 100644 index 000000000..56c8e0f5b --- /dev/null +++ b/test/tst_compression_bzip2.py @@ -0,0 +1,53 @@ +from numpy.random.mtrand import uniform +from netCDF4 import Dataset +from numpy.testing import assert_almost_equal +import os, tempfile, unittest + +ndim = 100000 +filename1 = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name +filename2 = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name +array = uniform(size=(ndim,)) + +def write_netcdf(filename,dtype='f8',complevel=6): + nc = Dataset(filename,'w') + nc.createDimension('n', ndim) + foo = nc.createVariable('data2',\ + dtype,('n'),compression='bzip2',complevel=complevel) + foo[:] = array + nc.close() + +class CompressionTestCase(unittest.TestCase): + + def setUp(self): + self.filename1 = filename1 + self.filename2 = filename2 + write_netcdf(self.filename1,complevel=0) # no compression + write_netcdf(self.filename2,complevel=4) # with compression + + def tearDown(self): + # Remove the temporary files + os.remove(self.filename1) + os.remove(self.filename2) + + def runTest(self): + uncompressed_size = os.stat(self.filename1).st_size + # check uncompressed data + f = Dataset(self.filename) + size = os.stat(self.filename1).st_size + assert_almost_equal(array,f.variables['data'][:]) + assert f.variables['data'].filters() ==\ + {'zlib':False,'zstd':False,'bzip2':False,'shuffle':False,'complevel':0,'fletcher32':False} + assert_almost_equal(size,uncompressed_size) + f.close() + # check compressed data. + f = Dataset(self.filename2) + size = os.stat(self.filename2).st_size + assert_almost_equal(array,f.variables['data'][:]) + assert f.variables['data'].filters() ==\ + {'zlib':False,'zstd':False,'bzip2':True,'shuffle':False,'complevel':4,'fletcher32':False} + print(size, uncompressed_size) + assert(size < 0.95*uncompressed_size) + f.close() + +if __name__ == '__main__': + unittest.main() diff --git a/test/tst_compression_zstd.py b/test/tst_compression_zstd.py new file mode 100644 index 000000000..f3a1873a2 --- /dev/null +++ b/test/tst_compression_zstd.py @@ -0,0 +1,53 @@ +from numpy.random.mtrand import uniform +from netCDF4 import Dataset +from numpy.testing import assert_almost_equal +import os, tempfile, unittest + +ndim = 100000 +filename1 = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name +filename2 = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name +array = uniform(size=(ndim,)) + +def write_netcdf(filename,dtype='f8',complevel=6): + nc = Dataset(filename,'w') + nc.createDimension('n', ndim) + foo = nc.createVariable('data2',\ + dtype,('n'),compression='zstd',complevel=complevel) + foo[:] = array + nc.close() + +class CompressionTestCase(unittest.TestCase): + + def setUp(self): + self.filename1 = filename1 + self.filename2 = filename2 + write_netcdf(self.filename1,complevel=0) # no compression + write_netcdf(self.filename2,complevel=4) # with compression + + def tearDown(self): + # Remove the temporary files + os.remove(self.filename1) + os.remove(self.filename2) + + def runTest(self): + uncompressed_size = os.stat(self.filename1).st_size + # check uncompressed data + f = Dataset(self.filename) + size = os.stat(self.filename1).st_size + assert_almost_equal(array,f.variables['data'][:]) + assert f.variables['data'].filters() ==\ + {'zlib':False,'zstd':False,'bzip2':False,'shuffle':False,'complevel':0,'fletcher32':False} + assert_almost_equal(size,uncompressed_size) + f.close() + # check compressed data. + f = Dataset(self.filename2) + size = os.stat(self.filename2).st_size + assert_almost_equal(array,f.variables['data'][:]) + assert f.variables['data'].filters() ==\ + {'zlib':False,'zstd':True,'bzip2':False,'shuffle':False,'complevel':4,'fletcher32':False} + print(size, uncompressed_size) + assert(size < 0.95*uncompressed_size) + f.close() + +if __name__ == '__main__': + unittest.main() From 57aec6634d21e9a7eafc3389ba1413ac22319a78 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 13:32:40 -0600 Subject: [PATCH 15/92] update --- test/run_all.py | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/test/run_all.py b/test/run_all.py index 723b0cdd6..df462c10d 100755 --- a/test/run_all.py +++ b/test/run_all.py @@ -1,7 +1,9 @@ import glob, os, sys, unittest, struct from netCDF4 import getlibversion,__hdf5libversion__,__netcdf4libversion__,__version__ from netCDF4 import __has_cdf5_format__, __has_nc_inq_path__, __has_nc_create_mem__, \ - __has_parallel4_support__, __has_pnetcdf_support__, __has_quantization_support__ + __has_parallel4_support__, __has_pnetcdf_support__, \ + __has_zstandard_support__, __has_bzip2_support__, \ + __has_blosc_support__,__has_quantization_support__ # can also just run # python -m unittest discover . 'tst*py' @@ -21,8 +23,14 @@ test_files.remove('tst_cdf5.py') sys.stdout.write('not running tst_cdf5.py ...\n') if not __has_quantization_support__: - test_files.remove('tst_compression2.py') - sys.stdout.write('not running tst_compression2.py ...\n') + test_files.remove('tst_compression_quant.py') + sys.stdout.write('not running tst_compression_quant.py ...\n') +if not __has_zstandard_support__: + test_files.remove('tst_compression_zstd.py') + sys.stdout.write('not running tst_compression_quant.py ...\n') +if not __has_bzip2_support__: + test_files.remove('tst_compression_bzip2.py') + sys.stdout.write('not running tst_compression_bzip2.py ...\n') # Don't run tests that require network connectivity if os.getenv('NO_NET'): From 977a016f054a868e981c0b331b861d34366e8ffc Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 15:32:52 -0600 Subject: [PATCH 16/92] add __has_blosc_support__ --- src/netCDF4/_netCDF4.pyx | 1 + 1 file changed, 1 insertion(+) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 0c9e0febc..bf0350596 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -1323,6 +1323,7 @@ __has_pnetcdf_support__ = HAS_PNETCDF_SUPPORT __has_quantization_support__ = HAS_QUANTIZATION_SUPPORT __has_zstandard_support__ = HAS_ZSTANDARD_SUPPORT __has_bzip2_support__ = HAS_BZIP2_SUPPORT +__has_blosc_support__ = HAS_BLOSC_SUPPORT _needsworkaround_issue485 = __netcdf4libversion__ < "4.4.0" or \ (__netcdf4libversion__.startswith("4.4.0") and \ "-development" in __netcdf4libversion__) From be86c9a3d5320408222b152e3205cb7bdb7917d3 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 15:35:30 -0600 Subject: [PATCH 17/92] add __has_blosc_support__ --- src/netCDF4/__init__.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/netCDF4/__init__.py b/src/netCDF4/__init__.py index f9fe6a736..b79cf6323 100644 --- a/src/netCDF4/__init__.py +++ b/src/netCDF4/__init__.py @@ -9,6 +9,6 @@ __has_nc_create_mem__, __has_cdf5_format__, __has_parallel4_support__, __has_pnetcdf_support__, __has_quantization_support__, __has_zstandard_support__, - __has_bzip2_support__) + __has_bzip2_support__, __has_blosc_support__) __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] From 55cc3657b4060b2bdf1c135342e54747c992e535 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 15:47:26 -0600 Subject: [PATCH 18/92] update --- test/tst_compression.py | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/test/tst_compression.py b/test/tst_compression.py index 3aa672017..9c9bd4929 100644 --- a/test/tst_compression.py +++ b/test/tst_compression.py @@ -88,8 +88,10 @@ def runTest(self): size = os.stat(self.files[0]).st_size assert_almost_equal(array,f.variables['data'][:]) assert_almost_equal(array,f.variables['data2'][:]) - assert f.variables['data'].filters() == {'compression':None,'zlib':False,'shuffle':False,'complevel':0,'fletcher32':False} - assert f.variables['data2'].filters() == {'compression':None,'zlib':False,'shuffle':False,'complevel':0,'fletcher32':False} + assert f.variables['data'].filters() ==\ + {'zlib':False,'zstd':False,'bzip2':False,'shuffle':False,'complevel':0,'fletcher32':False} + assert f.variables['data2'].filters() ==\ + {'zlib':False,'zstd':False,'bzip2':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(size,uncompressed_size) f.close() # check compressed data. @@ -97,7 +99,8 @@ def runTest(self): size = os.stat(self.files[1]).st_size assert_almost_equal(array,f.variables['data'][:]) assert_almost_equal(array,f.variables['data2'][:]) - assert f.variables['data'].filters() == {'zlib':True,'shuffle':False,'complevel':6,'fletcher32':False} + assert f.variables['data'].filters() ==\ + {'zlib':True,'zstd':False,'bzip2':False,'shuffle':False,'complevel':6,'fletcher32':False} assert f.variables['data2'].filters() == {'zlib':True,'shuffle':False,'complevel':6,'fletcher32':False} assert(size < 0.95*uncompressed_size) f.close() @@ -106,8 +109,10 @@ def runTest(self): size = os.stat(self.files[2]).st_size assert_almost_equal(array,f.variables['data'][:]) assert_almost_equal(array,f.variables['data2'][:]) - assert f.variables['data'].filters() == {'zlib':True,'shuffle':True,'complevel':6,'fletcher32':False} - assert f.variables['data2'].filters() == {'zlib':True,'shuffle':True,'complevel':6,'fletcher32':False} + assert f.variables['data'].filters() ==\ + {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':6,'fletcher32':False} + assert f.variables['data2'].filters() ==\ + {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':6,'fletcher32':False} assert(size < 0.85*uncompressed_size) f.close() # check lossy compression without shuffle From ac7d4ce5ac992b7a6c2b50ad50044fc807cab1af Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 16:01:07 -0600 Subject: [PATCH 19/92] update --- test/tst_compression.py | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/test/tst_compression.py b/test/tst_compression.py index 9c9bd4929..f10ec9a9f 100644 --- a/test/tst_compression.py +++ b/test/tst_compression.py @@ -101,7 +101,8 @@ def runTest(self): assert_almost_equal(array,f.variables['data2'][:]) assert f.variables['data'].filters() ==\ {'zlib':True,'zstd':False,'bzip2':False,'shuffle':False,'complevel':6,'fletcher32':False} - assert f.variables['data2'].filters() == {'zlib':True,'shuffle':False,'complevel':6,'fletcher32':False} + assert f.variables['data2'].filters() ==\ + {'zlib':True,'zstd':False,'bzip2':False,'shuffle':False,'complevel':6,'fletcher32':False} assert(size < 0.95*uncompressed_size) f.close() # check compression with shuffle From 4adc53efb3edb424e08d15a51060524671d32070 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 16:19:06 -0600 Subject: [PATCH 20/92] update --- src/netCDF4/_netCDF4.pyx | 4 ++-- test/tst_compression_bzip2.py | 7 +++---- test/tst_compression_zstd.py | 11 +++++------ 3 files changed, 10 insertions(+), 12 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index bf0350596..802b42791 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -3940,7 +3940,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. msg = """ compression='zstd' only works with netcdf-c >= 4.9.0. To enable, install Cython, make sure you have version 4.9.0 or higher netcdf-c with zstandard support, and rebuild netcdf4-python.""" - raise ValueError(msg) + raise ValueError(msg) if bzip2: IF HAS_BZIP2_SUPPORT: icomplevel = complevel @@ -3952,7 +3952,7 @@ version 4.9.0 or higher netcdf-c with zstandard support, and rebuild netcdf4-pyt msg = """ compression='bzip2' only works with netcdf-c >= 4.9.0. To enable, install Cython, make sure you have version 4.9.0 or higher netcdf-c with bzip2 support, and rebuild netcdf4-python.""" - raise ValueError(msg) + raise ValueError(msg) # set checksum. if fletcher32 and ndims: # don't bother for scalar variable ierr = nc_def_var_fletcher32(self._grpid, self._varid, 1) diff --git a/test/tst_compression_bzip2.py b/test/tst_compression_bzip2.py index 56c8e0f5b..a8154e88a 100644 --- a/test/tst_compression_bzip2.py +++ b/test/tst_compression_bzip2.py @@ -11,7 +11,7 @@ def write_netcdf(filename,dtype='f8',complevel=6): nc = Dataset(filename,'w') nc.createDimension('n', ndim) - foo = nc.createVariable('data2',\ + foo = nc.createVariable('data',\ dtype,('n'),compression='bzip2',complevel=complevel) foo[:] = array nc.close() @@ -32,7 +32,7 @@ def tearDown(self): def runTest(self): uncompressed_size = os.stat(self.filename1).st_size # check uncompressed data - f = Dataset(self.filename) + f = Dataset(self.filename1) size = os.stat(self.filename1).st_size assert_almost_equal(array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ @@ -45,8 +45,7 @@ def runTest(self): assert_almost_equal(array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ {'zlib':False,'zstd':False,'bzip2':True,'shuffle':False,'complevel':4,'fletcher32':False} - print(size, uncompressed_size) - assert(size < 0.95*uncompressed_size) + assert(size < 0.96*uncompressed_size) f.close() if __name__ == '__main__': diff --git a/test/tst_compression_zstd.py b/test/tst_compression_zstd.py index f3a1873a2..95292b4bf 100644 --- a/test/tst_compression_zstd.py +++ b/test/tst_compression_zstd.py @@ -11,7 +11,7 @@ def write_netcdf(filename,dtype='f8',complevel=6): nc = Dataset(filename,'w') nc.createDimension('n', ndim) - foo = nc.createVariable('data2',\ + foo = nc.createVariable('data',\ dtype,('n'),compression='zstd',complevel=complevel) foo[:] = array nc.close() @@ -32,7 +32,7 @@ def tearDown(self): def runTest(self): uncompressed_size = os.stat(self.filename1).st_size # check uncompressed data - f = Dataset(self.filename) + f = Dataset(self.filename1) size = os.stat(self.filename1).st_size assert_almost_equal(array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ @@ -43,10 +43,9 @@ def runTest(self): f = Dataset(self.filename2) size = os.stat(self.filename2).st_size assert_almost_equal(array,f.variables['data'][:]) - assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':True,'bzip2':False,'shuffle':False,'complevel':4,'fletcher32':False} - print(size, uncompressed_size) - assert(size < 0.95*uncompressed_size) + #assert f.variables['data'].filters() ==\ + #{'zlib':False,'zstd':True,'bzip2':False,'shuffle':False,'complevel':4,'fletcher32':False} + assert(size < 0.96*uncompressed_size) f.close() if __name__ == '__main__': From 5e76baf9a19a73a0035d23fe431ef9ac6aa224b0 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 16:24:58 -0600 Subject: [PATCH 21/92] update --- test/tst_compression_zstd.py | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/test/tst_compression_zstd.py b/test/tst_compression_zstd.py index 95292b4bf..4114976b5 100644 --- a/test/tst_compression_zstd.py +++ b/test/tst_compression_zstd.py @@ -43,8 +43,9 @@ def runTest(self): f = Dataset(self.filename2) size = os.stat(self.filename2).st_size assert_almost_equal(array,f.variables['data'][:]) - #assert f.variables['data'].filters() ==\ - #{'zlib':False,'zstd':True,'bzip2':False,'shuffle':False,'complevel':4,'fletcher32':False} + #print(f.variables['data'].filters()) + assert f.variables['data'].filters() ==\ + {'zlib':False,'zstd':True,'bzip2':False,'shuffle':False,'complevel':4,'fletcher32':False} assert(size < 0.96*uncompressed_size) f.close() From c82723bbc48a133e9469b94c8c093b9e6d31a356 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 16:47:23 -0600 Subject: [PATCH 22/92] fix complevel in filters method --- src/netCDF4/_netCDF4.pyx | 12 +++++++----- test/tst_compression_zstd.py | 1 - 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 802b42791..0a564c4ad 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -4334,7 +4334,7 @@ attributes.""" **`filters(self)`** return dictionary containing HDF5 filter parameters.""" - cdef int ierr,ideflate,ishuffle,icomplevel,ifletcher32 + cdef int ierr,ideflate,ishuffle,icomplevel,icomplevel_zstd,icomplevel_bzip2,ifletcher32 cdef int izstd=0 cdef int ibzip2=0 filtdict = {'zlib':False,'zstd':False,'bzip2':False,'shuffle':False,'complevel':0,'fletcher32':False} @@ -4346,20 +4346,22 @@ return dictionary containing HDF5 filter parameters.""" ierr = nc_inq_var_fletcher32(self._grpid, self._varid, &ifletcher32) _ensure_nc_success(ierr) IF HAS_ZSTANDARD_SUPPORT: - ierr = nc_inq_var_zstandard(self._grpid, self._varid, &izstd, &icomplevel) + ierr = nc_inq_var_zstandard(self._grpid, self._varid, &izstd,\ + &icomplevel_zstd) _ensure_nc_success(ierr) IF HAS_BZIP2_SUPPORT: - ierr = nc_inq_var_bzip2(self._grpid, self._varid, &ibzip2, &icomplevel) + ierr = nc_inq_var_bzip2(self._grpid, self._varid, &ibzip2,\ + &icomplevel_bzip2) _ensure_nc_success(ierr) if ideflate: filtdict['zlib']=True filtdict['complevel']=icomplevel if izstd: filtdict['zstd']=True - filtdict['complevel']=icomplevel + filtdict['complevel']=icomplevel_zstd if ibzip2: filtdict['bzip2']=True - filtdict['complevel']=icomplevel + filtdict['complevel']=icomplevel_bzip2 if ishuffle: filtdict['shuffle']=True if ifletcher32: diff --git a/test/tst_compression_zstd.py b/test/tst_compression_zstd.py index 4114976b5..c270ee4eb 100644 --- a/test/tst_compression_zstd.py +++ b/test/tst_compression_zstd.py @@ -43,7 +43,6 @@ def runTest(self): f = Dataset(self.filename2) size = os.stat(self.filename2).st_size assert_almost_equal(array,f.variables['data'][:]) - #print(f.variables['data'].filters()) assert f.variables['data'].filters() ==\ {'zlib':False,'zstd':True,'bzip2':False,'shuffle':False,'complevel':4,'fletcher32':False} assert(size < 0.96*uncompressed_size) From 846cf450be653a8f53ff3eb1560a26229fc6a107 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 17:03:15 -0600 Subject: [PATCH 23/92] install filter plugins --- .github/workflows/build_master.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 81c43b455..619ad7bbf 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -34,6 +34,8 @@ jobs: ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 make -j 2 make install + mkdir -p /usr/local/hdf5/lib + /bin/cp -R plugins/.libs/* /usr/local/hdf5/lib popd # - name: The job has failed From 1bd6a801db3ba6eec541d998d0c7f7dc524e79a1 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 17:21:33 -0600 Subject: [PATCH 24/92] update --- .github/workflows/build_master.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 619ad7bbf..b4b74bb8b 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -34,8 +34,6 @@ jobs: ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 make -j 2 make install - mkdir -p /usr/local/hdf5/lib - /bin/cp -R plugins/.libs/* /usr/local/hdf5/lib popd # - name: The job has failed @@ -56,6 +54,7 @@ jobs: - name: Test run: | export PATH=${NETCDF_DIR}/bin:${PATH} + export HDF5_PLUGIN_PATH=${NETCDF_DIR}/plugins/.libs python checkversion.py # serial cd test From 96e993be578b701495334c065cd02b1e674a6c27 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 17:48:30 -0600 Subject: [PATCH 25/92] update --- .github/workflows/build_master.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index b4b74bb8b..97c0a40c8 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -55,6 +55,7 @@ jobs: run: | export PATH=${NETCDF_DIR}/bin:${PATH} export HDF5_PLUGIN_PATH=${NETCDF_DIR}/plugins/.libs + ls -l $HDF5_PLUGIN_PATH python checkversion.py # serial cd test From 2a1beed18fc71b15e1aa84f4761831c63865da35 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 18:25:45 -0600 Subject: [PATCH 26/92] update --- .github/workflows/build_master.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 97c0a40c8..9a609188d 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -54,7 +54,7 @@ jobs: - name: Test run: | export PATH=${NETCDF_DIR}/bin:${PATH} - export HDF5_PLUGIN_PATH=${NETCDF_DIR}/plugins/.libs + export HDF5_PLUGIN_PATH=${NETCDF_DIR}/netcdf4-python/plugins/.libs ls -l $HDF5_PLUGIN_PATH python checkversion.py # serial From 4c8c427fc4eb7b8efc4c8bbdab568eed1cb70e53 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 18:58:21 -0600 Subject: [PATCH 27/92] update --- .github/workflows/build_master.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 9a609188d..e4903e29d 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -54,7 +54,7 @@ jobs: - name: Test run: | export PATH=${NETCDF_DIR}/bin:${PATH} - export HDF5_PLUGIN_PATH=${NETCDF_DIR}/netcdf4-python/plugins/.libs + export HDF5_PLUGIN_PATH=${NETCDF_DIR}/../netcdf-c/plugins/.libs ls -l $HDF5_PLUGIN_PATH python checkversion.py # serial @@ -62,7 +62,6 @@ jobs: python run_all.py # parallel cd ../examples - python bench_compress4.py mpirun.mpich -np 4 python mpi_example.py if [ $? -ne 0 ] ; then echo "hdf5 mpi test failed!" From bd25b57af948757f01a4f3fbb16e5b8b98815297 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 19:08:29 -0600 Subject: [PATCH 28/92] update --- .github/workflows/build_master.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index e4903e29d..5673a308f 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -54,7 +54,7 @@ jobs: - name: Test run: | export PATH=${NETCDF_DIR}/bin:${PATH} - export HDF5_PLUGIN_PATH=${NETCDF_DIR}/../netcdf-c/plugins/.libs + export HDF5_PLUGIN_PATH=${NETCDF_DIR}/netcdf-c/plugins/.libs ls -l $HDF5_PLUGIN_PATH python checkversion.py # serial From 9b1b48446e4dd0be10afaaaf6efa411aa5ccc057 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 19:16:51 -0600 Subject: [PATCH 29/92] update --- .github/workflows/build_master.yml | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 5673a308f..1c9e78315 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -23,7 +23,7 @@ jobs: - name: Install Ubuntu Dependencies run: | sudo apt-get update - sudo apt-get install mpich libmpich-dev libhdf5-mpich-dev libcurl4-openssl-dev bzip2 libblosc-dev libzstd-dev + sudo apt-get install mpich libmpich-dev libhdf5-mpich-dev libcurl4-openssl-dev bzip2 libsnappy-dev libblosc-dev libzstd-dev echo "Download and build netCDF github master" git clone https://github.com/Unidata/netcdf-c pushd netcdf-c @@ -34,6 +34,8 @@ jobs: ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 make -j 2 make install + pwd + ls -l plugins/.libs popd # - name: The job has failed From 57b884406be71f8e4b5c99f3f3867ff43d579152 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 19:22:28 -0600 Subject: [PATCH 30/92] update --- .github/workflows/build_master.yml | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 1c9e78315..285b9b3f6 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -35,7 +35,8 @@ jobs: make -j 2 make install pwd - ls -l plugins/.libs + mkdir ${NETCDF_DIR}/hdf5_plugins + /bin/mv -f plugins ${NETCDF_DIR} popd # - name: The job has failed @@ -56,7 +57,7 @@ jobs: - name: Test run: | export PATH=${NETCDF_DIR}/bin:${PATH} - export HDF5_PLUGIN_PATH=${NETCDF_DIR}/netcdf-c/plugins/.libs + export HDF5_PLUGIN_PATH=${NETCDF_DIR}/plugins/.libs ls -l $HDF5_PLUGIN_PATH python checkversion.py # serial From 489d0308f21df6e97ec77db3d4915eedb8075308 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 19:28:12 -0600 Subject: [PATCH 31/92] add blosc compressors --- .github/workflows/build_master.yml | 1 - Changelog | 10 ++-- src/netCDF4/_netCDF4.pyx | 83 +++++++++++++++++++++++++----- 3 files changed, 76 insertions(+), 18 deletions(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 285b9b3f6..35e6d43fb 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -58,7 +58,6 @@ jobs: run: | export PATH=${NETCDF_DIR}/bin:${PATH} export HDF5_PLUGIN_PATH=${NETCDF_DIR}/plugins/.libs - ls -l $HDF5_PLUGIN_PATH python checkversion.py # serial cd test diff --git a/Changelog b/Changelog index d3cbac8b2..bbb3dfe03 100644 --- a/Changelog +++ b/Changelog @@ -12,11 +12,11 @@ * remove all vestiges of python 2 in _netCDF4.pyx and set cython language_level directive to 3 in setup.py. * add 'compression' kwarg to createVariable to enable new compression - functionality in netcdf-c 4.9.0. 'None', 'zlib', 'zstd' and 'bzip2 are currently - allowed (compression='zlib' is equivalent to zlib=True), but allows - for new compression algorithms to be added when they become available - in netcdf-c. The 'zlib' kwarg is now deprecated. Blosc compression feature - in netcdf-c 4.9.0 not yet supported. + functionality in netcdf-c 4.9.0. 'None','zlib','zstd','bzip2' + 'blosc_lz','blosc_lz4','blosc_lz4hc','blosc_zlib','blosc_zstd' and + 'blosc_snappy' are currently supported. 'blosc_shuffle' and + 'blosc_blocksize' kwargs also added. + compression='zlib' is equivalent to (the now deprecated) zlib=True. * MFDataset did not aggregate 'name' variable attribute (issue #1153). * issue warning instead of raising an exception if missing_value or _FillValue can't be cast to the variable type when creating a diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 0a564c4ad..b686716f0 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -2641,7 +2641,8 @@ datatype.""" def createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, - complevel=4, shuffle=True, fletcher32=False, contiguous=False, + complevel=4, shuffle=True, + blosc_shuffle=0, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None,quantize_mode='BitGroom',fill_value=None, chunk_cache=None): """ @@ -2682,7 +2683,9 @@ is an empty tuple, which means the variable is a scalar. If the optional keyword argument `compression` is set, the data will be compressed in the netCDF file using the specified compression algorithm. -Currently 'zlib','zstd' and 'bzip2' are supported. Default is `None` (no compression). +Currently 'zlib','zstd','bzip2','blosc_' are supported +(where can be one of lz,lz4,lz4hc,zlib,zstd,snappy). +Default is `None` (no compression). If the optional keyword `zlib` is `True`, the data will be compressed in the netCDF file using zlib compression (default `False`). The use of this option is @@ -2697,6 +2700,12 @@ will be applied before compressing the data (default `True`). This significantly improves compression. Default is `True`. Ignored if `zlib=False`. +The optional kwargs 'blosc_shuffle` and `blosc_blocksize` are ignored +unless the blosc compressor is used. `blosc_shuffle` can be 0 (no shuffle), +1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 0. `blosc_blocksize` +is the tunable blosc blocksize in bytes (Default 0 means the blocksize is +chosen internally). + If the optional keyword `fletcher32` is `True`, the Fletcher32 HDF5 checksum algorithm is activated to detect errors. Default `False`. @@ -3648,12 +3657,13 @@ behavior is similar to Fortran or Matlab, but different than numpy. def __init__(self, grp, name, datatype, dimensions=(), compression=None, zlib=False, - complevel=4, shuffle=True, fletcher32=False, contiguous=False, + complevel=4, shuffle=True, blosc_shuffle=0, blosc_blocksize=0, + fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None,quantize_mode='BitGroom',fill_value=None, chunk_cache=None, **kwargs): """ **`__init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, - complevel=4, shuffle=True, fletcher32=False, contiguous=False, + complevel=4, shuffle=True, blosc_shuffle=0, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None,chunk_cache=None)`** @@ -3687,7 +3697,9 @@ behavior is similar to Fortran or Matlab, but different than numpy. which means the variable is a scalar (and therefore has no dimensions). **`compression`**: compression algorithm to use. Default None. Currently - 'zlib','zstd' and 'bzip2' are supported. + 'zlib','zstd','bzip2','blosc_' are supported + (where can be one of lz,lz4,lz4hc,zlib,zstd,snappy). + Default is `None` (no compression). **`zlib`**: if `True`, data assigned to the `Variable` instance is compressed on disk. Default `False`. Deprecated - use @@ -3700,6 +3712,14 @@ behavior is similar to Fortran or Matlab, but different than numpy. **`shuffle`**: if `True`, the HDF5 shuffle filter is applied to improve compression. Default `True`. Ignored if `compression=None`. + **`blosc_shuffle`**: shuffle filter inside blosc compressor (only + relevant if compression kwarg set to one of the blosc compressors). + Can be 0 (no blosc shuffle), 1 (bytewise shuffle) or 2 (bitwise + shuffle)). Default is 0. + + **`blosc_blocksize`**: tunable blocksize in bytes for blosc + compressors. Default of 0 means blosc library chooses a blocksize. + **`fletcher32`**: if `True` (default `False`), the Fletcher32 checksum algorithm is used for error detection. @@ -3731,8 +3751,8 @@ behavior is similar to Fortran or Matlab, but different than numpy. The `compression, zlib, complevel, shuffle, fletcher32, contiguous` and `chunksizes` keywords are silently ignored for netCDF 3 files that do not use HDF5. - **`least_significant_digit`**: If this or `significant_digits` are specified, - variable data will be truncated (quantized). + **`least_significant_digit`**: If this or `significant_digits` are specified, + variable data will be truncated (quantized). In conjunction with `compression='zlib'` this produces 'lossy', but significantly more efficient compression. For example, if `least_significant_digit=1`, data will be quantized using @@ -3740,7 +3760,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. so that a precision of 0.1 is retained (in this case bits=4). Default is `None`, or no quantization. - **`significant_digits`**: New in version 1.6.0. + **`significant_digits`**: New in version 1.6.0. As described for `least_significant_digit` except the number of significant digits retained is prescribed independent of the floating point exponent. Default `None` - no quantization done. @@ -3748,7 +3768,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. **`quantize_mode`**: New in version 1.6.0. Controls the quantization algorithm (default 'BitGroom', 'BitRound' and 'GranularBitRound' also available). The 'GranularBitRound' - algorithm may result in better compression for typical geophysical datasets. + algorithm may result in better compression for typical geophysical datasets. Ignored if `significant_digts` not specified. If 'BitRound' is used, then `significant_digits` is interpreted as binary (not decimal) digits. @@ -3759,14 +3779,15 @@ behavior is similar to Fortran or Matlab, but different than numpy. in the dictionary `netCDF4.default_fillvals`. **`chunk_cache`**: If specified, sets the chunk cache size for this variable. - Persists as long as Dataset is open. Use `set_var_chunk_cache` to - change it when Dataset is re-opened. + Persists as long as Dataset is open. Use `set_var_chunk_cache` to + change it when Dataset is re-opened. ***Note***: `Variable` instances should be created using the `Dataset.createVariable` method of a `Dataset` or `Group` instance, not using this class directly. """ - cdef int ierr, ndims, icontiguous, icomplevel, numdims, _grpid, nsd + cdef int ierr, ndims, icontiguous, icomplevel, numdims, _grpid, nsd,\ + iblosc_blocksize,iblosc_compressor,iblosc_shuffle cdef char namstring[NC_MAX_NAME+1] cdef char *varname cdef nc_type xtype @@ -3786,12 +3807,30 @@ behavior is similar to Fortran or Matlab, but different than numpy. zlib = False zstd = False bzip2 = False + blosc_lz = False + blosc_lz4 = False + blosc_lz4hc = False + blosc_snappy = False + blosc_zlib = False + blosc_zstd = False if compression == 'zlib': zlib = True elif compression == 'zstd': zstd = True elif compression == 'bzip2': bzip2 = True + elif compression == 'blosc_lz': + blosc_lz = True + elif compression == 'blosc_lz': + blosc_lz4 = True + elif compression == 'blosc_lz4hc': + blosc_lz4hc = True + elif compression == 'blosc_snappy': + blosc_snappy = True + elif compression == 'blosc_zlib': + blosc_zlib = True + elif compression == 'blosc_zstd': + blosc_zstd = True elif not compression: compression = None # if compression evaluates to False, set to None. pass @@ -3953,6 +3992,26 @@ version 4.9.0 or higher netcdf-c with zstandard support, and rebuild netcdf4-pyt compression='bzip2' only works with netcdf-c >= 4.9.0. To enable, install Cython, make sure you have version 4.9.0 or higher netcdf-c with bzip2 support, and rebuild netcdf4-python.""" raise ValueError(msg) + if blosc_lz or blosc_lz4 or blosc_lz4hc or blosc_zlib or\ + blosc_zstd or blosc_snappy: + IF HAS_BLOSC_SUPPORT: + icomplevel = complevel + blosc_dict={'blosc_lz':0,'blosc_lz4':1,'blosc_lz4hc':2,'blosc_snappy':3,'blosc_zlib':4,'blosc_zstd':5} + iblosc_compression = blosc_dict[compression] + iblosc_shuffle = blosc_shuffle + iblosc_blocksize = blosc_blocksize + ierr = nc_def_var_blosc(self._grpid, self._varid,\ + iblosc_compressor,\ + icomplevel,iblosc_blocksize,\ + iblosc_shuffle) + if ierr != NC_NOERR: + if grp.data_model != 'NETCDF4': grp._enddef() + _ensure_nc_success(ierr) + ELSE: + msg = """ +compression='blosc_*' only works with netcdf-c >= 4.9.0. To enable, install Cython, make sure you have +version 4.9.0 or higher netcdf-c with blosc support, and rebuild netcdf4-python.""" + raise ValueError(msg) # set checksum. if fletcher32 and ndims: # don't bother for scalar variable ierr = nc_def_var_fletcher32(self._grpid, self._varid, 1) From 72f18f623650650372c0d4e4773211e2fa219c30 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 19:42:23 -0600 Subject: [PATCH 32/92] update blosc stuff --- src/netCDF4/_netCDF4.pyx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index b686716f0..1b4b08a3f 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -3786,8 +3786,8 @@ behavior is similar to Fortran or Matlab, but different than numpy. `Dataset.createVariable` method of a `Dataset` or `Group` instance, not using this class directly. """ - cdef int ierr, ndims, icontiguous, icomplevel, numdims, _grpid, nsd,\ - iblosc_blocksize,iblosc_compressor,iblosc_shuffle + cdef int ierr, ndims, icontiguous, icomplevel, numdims, _grpid, nsd, + cdef unsigned iblosc_complevel,iblosc_blocksize,iblosc_compressor,iblosc_shuffle cdef char namstring[NC_MAX_NAME+1] cdef char *varname cdef nc_type xtype @@ -3821,7 +3821,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. bzip2 = True elif compression == 'blosc_lz': blosc_lz = True - elif compression == 'blosc_lz': + elif compression == 'blosc_lz4': blosc_lz4 = True elif compression == 'blosc_lz4hc': blosc_lz4hc = True @@ -3995,14 +3995,14 @@ version 4.9.0 or higher netcdf-c with bzip2 support, and rebuild netcdf4-python. if blosc_lz or blosc_lz4 or blosc_lz4hc or blosc_zlib or\ blosc_zstd or blosc_snappy: IF HAS_BLOSC_SUPPORT: - icomplevel = complevel blosc_dict={'blosc_lz':0,'blosc_lz4':1,'blosc_lz4hc':2,'blosc_snappy':3,'blosc_zlib':4,'blosc_zstd':5} iblosc_compression = blosc_dict[compression] iblosc_shuffle = blosc_shuffle iblosc_blocksize = blosc_blocksize + iblosc_complevel = complevel ierr = nc_def_var_blosc(self._grpid, self._varid,\ iblosc_compressor,\ - icomplevel,iblosc_blocksize,\ + iblosc_complevel,iblosc_blocksize,\ iblosc_shuffle) if ierr != NC_NOERR: if grp.data_model != 'NETCDF4': grp._enddef() From 67137892246e79451359810e8783cbb0c31162c3 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 20:26:02 -0600 Subject: [PATCH 33/92] test for blosc compression --- src/netCDF4/_netCDF4.pyx | 24 ++++++++++++---- test/tst_compression.py | 18 ++++++------ test/tst_compression_blosc.py | 54 +++++++++++++++++++++++++++++++++++ test/tst_compression_bzip2.py | 4 +-- test/tst_compression_quant.py | 4 +-- test/tst_compression_zstd.py | 4 +-- 6 files changed, 88 insertions(+), 20 deletions(-) create mode 100644 test/tst_compression_blosc.py diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 1b4b08a3f..5bb4bffc5 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -1369,6 +1369,9 @@ _format_dict = {'NETCDF3_CLASSIC' : NC_FORMAT_CLASSIC, _cmode_dict = {'NETCDF3_CLASSIC' : NC_CLASSIC_MODEL, 'NETCDF4_CLASSIC' : NC_CLASSIC_MODEL | NC_NETCDF4, 'NETCDF4' : NC_NETCDF4} +# dicts for blosc compressors. +_blosc_dict={'blosc_lz':0,'blosc_lz4':1,'blosc_lz4hc':2,'blosc_snappy':3,'blosc_zlib':4,'blosc_zstd':5} +_blosc_dict_inv = {v: k for k, v in _blosc_dict.items()} IF HAS_CDF5_FORMAT: # NETCDF3_64BIT deprecated, saved for compatibility. # use NETCDF3_64BIT_OFFSET instead. @@ -2817,6 +2820,7 @@ is the number of variable dimensions.""" # create variable. group.variables[varname] = Variable(group, varname, datatype, dimensions=dimensions, compression=compression, zlib=zlib, complevel=complevel, shuffle=shuffle, + blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize, fletcher32=fletcher32, contiguous=contiguous, chunksizes=chunksizes, endian=endian, least_significant_digit=least_significant_digit, significant_digits=significant_digits,quantize_mode=quantize_mode,fill_value=fill_value, chunk_cache=chunk_cache) @@ -3657,7 +3661,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. def __init__(self, grp, name, datatype, dimensions=(), compression=None, zlib=False, - complevel=4, shuffle=True, blosc_shuffle=0, blosc_blocksize=0, + complevel=4, shuffle=True, blosc_shuffle=0, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None,quantize_mode='BitGroom',fill_value=None, chunk_cache=None, **kwargs): @@ -3787,7 +3791,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. `Group` instance, not using this class directly. """ cdef int ierr, ndims, icontiguous, icomplevel, numdims, _grpid, nsd, - cdef unsigned iblosc_complevel,iblosc_blocksize,iblosc_compressor,iblosc_shuffle + cdef unsigned int iblosc_complevel,iblosc_blocksize,iblosc_compressor,iblosc_shuffle cdef char namstring[NC_MAX_NAME+1] cdef char *varname cdef nc_type xtype @@ -3995,8 +3999,7 @@ version 4.9.0 or higher netcdf-c with bzip2 support, and rebuild netcdf4-python. if blosc_lz or blosc_lz4 or blosc_lz4hc or blosc_zlib or\ blosc_zstd or blosc_snappy: IF HAS_BLOSC_SUPPORT: - blosc_dict={'blosc_lz':0,'blosc_lz4':1,'blosc_lz4hc':2,'blosc_snappy':3,'blosc_zlib':4,'blosc_zstd':5} - iblosc_compression = blosc_dict[compression] + iblosc_compressor = _blosc_dict[compression] iblosc_shuffle = blosc_shuffle iblosc_blocksize = blosc_blocksize iblosc_complevel = complevel @@ -4396,7 +4399,9 @@ return dictionary containing HDF5 filter parameters.""" cdef int ierr,ideflate,ishuffle,icomplevel,icomplevel_zstd,icomplevel_bzip2,ifletcher32 cdef int izstd=0 cdef int ibzip2=0 - filtdict = {'zlib':False,'zstd':False,'bzip2':False,'shuffle':False,'complevel':0,'fletcher32':False} + cdef int iblosc=0 + cdef unsigned int iblosc_complevel,iblosc_blocksize,iblosc_compressor,iblosc_shuffle + filtdict = {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} if self._grp.data_model not in ['NETCDF4_CLASSIC','NETCDF4']: return with nogil: ierr = nc_inq_var_deflate(self._grpid, self._varid, &ishuffle, &ideflate, &icomplevel) @@ -4412,6 +4417,10 @@ return dictionary containing HDF5 filter parameters.""" ierr = nc_inq_var_bzip2(self._grpid, self._varid, &ibzip2,\ &icomplevel_bzip2) _ensure_nc_success(ierr) + IF HAS_BLOSC_SUPPORT: + ierr = nc_inq_var_blosc(self._grpid, self._varid, &iblosc,\ + &iblosc_compressor,&iblosc_complevel,&iblosc_blocksize,&iblosc_shuffle) + _ensure_nc_success(ierr) if ideflate: filtdict['zlib']=True filtdict['complevel']=icomplevel @@ -4421,6 +4430,11 @@ return dictionary containing HDF5 filter parameters.""" if ibzip2: filtdict['bzip2']=True filtdict['complevel']=icomplevel_bzip2 + if iblosc: + #filtdict['blosc']=True + blosc_compressor = iblosc_compressor + filtdict['blosc']={'compressor':_blosc_dict_inv[blosc_compressor],'shuffle':iblosc_shuffle,'blocksize':iblosc_blocksize} + filtdict['complevel']=iblosc_complevel if ishuffle: filtdict['shuffle']=True if ifletcher32: diff --git a/test/tst_compression.py b/test/tst_compression.py index f10ec9a9f..c47422910 100644 --- a/test/tst_compression.py +++ b/test/tst_compression.py @@ -89,9 +89,9 @@ def runTest(self): assert_almost_equal(array,f.variables['data'][:]) assert_almost_equal(array,f.variables['data2'][:]) assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':False,'bzip2':False,'shuffle':False,'complevel':0,'fletcher32':False} + {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert f.variables['data2'].filters() ==\ - {'zlib':False,'zstd':False,'bzip2':False,'shuffle':False,'complevel':0,'fletcher32':False} + {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(size,uncompressed_size) f.close() # check compressed data. @@ -100,9 +100,9 @@ def runTest(self): assert_almost_equal(array,f.variables['data'][:]) assert_almost_equal(array,f.variables['data2'][:]) assert f.variables['data'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'shuffle':False,'complevel':6,'fletcher32':False} + {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':6,'fletcher32':False} assert f.variables['data2'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'shuffle':False,'complevel':6,'fletcher32':False} + {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':6,'fletcher32':False} assert(size < 0.95*uncompressed_size) f.close() # check compression with shuffle @@ -111,9 +111,9 @@ def runTest(self): assert_almost_equal(array,f.variables['data'][:]) assert_almost_equal(array,f.variables['data2'][:]) assert f.variables['data'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':6,'fletcher32':False} + {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':False} assert f.variables['data2'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':6,'fletcher32':False} + {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':False} assert(size < 0.85*uncompressed_size) f.close() # check lossy compression without shuffle @@ -138,9 +138,9 @@ def runTest(self): assert_almost_equal(checkarray,f.variables['data'][:]) assert_almost_equal(checkarray,f.variables['data2'][:]) assert f.variables['data'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':6,'fletcher32':True} + {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':True} assert f.variables['data2'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':6,'fletcher32':True} + {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':True} assert(size < 0.20*uncompressed_size) # should be slightly larger than without fletcher32 assert(size > size_save) @@ -150,7 +150,7 @@ def runTest(self): checkarray2 = _quantize(array2,lsd) assert_almost_equal(checkarray2,f.variables['data2'][:]) assert f.variables['data2'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':6,'fletcher32':True} + {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':True} assert f.variables['data2'].chunking() == [chunk1,chunk2] f.close() diff --git a/test/tst_compression_blosc.py b/test/tst_compression_blosc.py new file mode 100644 index 000000000..4324f26c1 --- /dev/null +++ b/test/tst_compression_blosc.py @@ -0,0 +1,54 @@ +from numpy.random.mtrand import uniform +from netCDF4 import Dataset +from numpy.testing import assert_almost_equal +import os, tempfile, unittest + +ndim = 100000 +filename1 = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name +filename2 = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name +array = uniform(size=(ndim,)) + +def write_netcdf(filename,dtype='f8',complevel=6): + nc = Dataset(filename,'w') + nc.createDimension('n', ndim) + foo = nc.createVariable('data',\ + dtype,('n'),compression='blosc_lz4',blosc_shuffle=2,complevel=complevel) + foo[:] = array + nc.close() + +class CompressionTestCase(unittest.TestCase): + + def setUp(self): + self.filename1 = filename1 + self.filename2 = filename2 + write_netcdf(self.filename1,complevel=0) # no compression + write_netcdf(self.filename2,complevel=4) # with compression + + def tearDown(self): + # Remove the temporary files + os.remove(self.filename1) + os.remove(self.filename2) + + def runTest(self): + uncompressed_size = os.stat(self.filename1).st_size + # check uncompressed data + f = Dataset(self.filename1) + size = os.stat(self.filename1).st_size + assert_almost_equal(array,f.variables['data'][:]) + assert f.variables['data'].filters() ==\ + {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} + assert_almost_equal(size,uncompressed_size) + f.close() + # check compressed data. + f = Dataset(self.filename2) + size = os.stat(self.filename2).st_size + assert_almost_equal(array,f.variables['data'][:]) + dtest= {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc':\ + {'compressor': 'blosc_lz4', 'shuffle': 2, 'blocksize': 800000},\ + 'shuffle': False, 'complevel': 4, 'fletcher32': False} + assert f.variables['data'].filters() == dtest + assert(size < 0.96*uncompressed_size) + f.close() + +if __name__ == '__main__': + unittest.main() diff --git a/test/tst_compression_bzip2.py b/test/tst_compression_bzip2.py index a8154e88a..8a162f20c 100644 --- a/test/tst_compression_bzip2.py +++ b/test/tst_compression_bzip2.py @@ -36,7 +36,7 @@ def runTest(self): size = os.stat(self.filename1).st_size assert_almost_equal(array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':False,'bzip2':False,'shuffle':False,'complevel':0,'fletcher32':False} + {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(size,uncompressed_size) f.close() # check compressed data. @@ -44,7 +44,7 @@ def runTest(self): size = os.stat(self.filename2).st_size assert_almost_equal(array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':False,'bzip2':True,'shuffle':False,'complevel':4,'fletcher32':False} + {'zlib':False,'zstd':False,'bzip2':True,'blosc':False,'shuffle':False,'complevel':4,'fletcher32':False} assert(size < 0.96*uncompressed_size) f.close() diff --git a/test/tst_compression_quant.py b/test/tst_compression_quant.py index 3fb42c298..9a44420c3 100644 --- a/test/tst_compression_quant.py +++ b/test/tst_compression_quant.py @@ -59,7 +59,7 @@ def runTest(self): #print('compressed lossless no shuffle = ',size) assert_almost_equal(data_array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'shuffle':False,'complevel':complevel,'fletcher32':False} + {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':complevel,'fletcher32':False} assert(size < 0.95*uncompressed_size) f.close() # check compression with shuffle @@ -68,7 +68,7 @@ def runTest(self): #print('compressed lossless with shuffle ',size) assert_almost_equal(data_array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'shuffle':True,'complevel':complevel,'fletcher32':False} + {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':complevel,'fletcher32':False} assert(size < 0.85*uncompressed_size) f.close() # check lossy compression without shuffle diff --git a/test/tst_compression_zstd.py b/test/tst_compression_zstd.py index c270ee4eb..cd9d270ef 100644 --- a/test/tst_compression_zstd.py +++ b/test/tst_compression_zstd.py @@ -36,7 +36,7 @@ def runTest(self): size = os.stat(self.filename1).st_size assert_almost_equal(array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':False,'bzip2':False,'shuffle':False,'complevel':0,'fletcher32':False} + {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(size,uncompressed_size) f.close() # check compressed data. @@ -44,7 +44,7 @@ def runTest(self): size = os.stat(self.filename2).st_size assert_almost_equal(array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':True,'bzip2':False,'shuffle':False,'complevel':4,'fletcher32':False} + {'zlib':False,'zstd':True,'bzip2':False,'blosc':False,'shuffle':False,'complevel':4,'fletcher32':False} assert(size < 0.96*uncompressed_size) f.close() From 79cad3858bcb86d9d7631ad1df692d19b51767cf Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 20:26:28 -0600 Subject: [PATCH 34/92] update docs --- docs/index.html | 42 +++++++++++++++++++++++++++++++----------- 1 file changed, 31 insertions(+), 11 deletions(-) diff --git a/docs/index.html b/docs/index.html index 3d402ea44..5304c3111 100644 --- a/docs/index.html +++ b/docs/index.html @@ -1612,7 +1612,7 @@

    In-memory (diskless) Datasets

    the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

    -

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    +

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    copyright: 2008 by Jeffrey Whitaker.

    @@ -1634,7 +1634,9 @@

    In-memory (diskless) Datasets

    __has_rename_grp__, __has_nc_inq_path__, __has_nc_inq_format_extended__, __has_nc_open_mem__, __has_nc_create_mem__, __has_cdf5_format__, - __has_parallel4_support__, __has_pnetcdf_support__,__has_quantization_support__) + __has_parallel4_support__, __has_pnetcdf_support__, + __has_quantization_support__, __has_zstandard_support__, + __has_bzip2_support__, __has_blosc_support__) __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache']
    @@ -2090,7 +2092,9 @@

    In-memory (diskless) Datasets

    If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. -Currently only 'zlib' is supported. Default is None (no compression).

    +Currently 'zlib','zstd','bzip2','blosc_' are supported +(where can be one of lz,lz4,lz4hc,zlib,zstd,snappy). +Default is None (no compression).

    If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is @@ -2105,6 +2109,12 @@

    In-memory (diskless) Datasets

    significantly improves compression. Default is True. Ignored if zlib=False.

    +

    The optional kwargs 'blosc_shuffleandblosc_blocksizeare ignored +unless the blosc compressor is used.blosc_shufflecan be 0 (no shuffle), +1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 0.blosc_blocksize` +is the tunable blosc blocksize in bytes (Default 0 means the blocksize is +chosen internally).

    +

    If the optional keyword fletcher32 is True, the Fletcher32 HDF5 checksum algorithm is activated to detect errors. Default False.

    @@ -2836,7 +2846,7 @@

    In-memory (diskless) Datasets

    __init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, -complevel=4, shuffle=True, fletcher32=False, contiguous=False, +complevel=4, shuffle=True, blosc_shuffle=0, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None,chunk_cache=None)

    @@ -2870,7 +2880,9 @@

    In-memory (diskless) Datasets

    which means the variable is a scalar (and therefore has no dimensions).

    compression: compression algorithm to use. Default None. Currently -only 'zlib' is supported.

    +'zlib','zstd','bzip2','blosc_' are supported +(where can be one of lz,lz4,lz4hc,zlib,zstd,snappy). +Default is None (no compression).

    zlib: if True, data assigned to the Variable instance is compressed on disk. Default False. Deprecated - use @@ -2883,6 +2895,14 @@

    In-memory (diskless) Datasets

    shuffle: if True, the HDF5 shuffle filter is applied to improve compression. Default True. Ignored if compression=None.

    +

    blosc_shuffle: shuffle filter inside blosc compressor (only +relevant if compression kwarg set to one of the blosc compressors). +Can be 0 (no blosc shuffle), 1 (bytewise shuffle) or 2 (bitwise +shuffle)). Default is 0.

    + +

    blosc_blocksize: tunable blocksize in bytes for blosc +compressors. Default of 0 means blosc library chooses a blocksize.

    +

    fletcher32: if True (default False), the Fletcher32 checksum algorithm is used for error detection.

    @@ -2914,8 +2934,8 @@

    In-memory (diskless) Datasets

    The compression, zlib, complevel, shuffle, fletcher32, contiguous and chunksizes keywords are silently ignored for netCDF 3 files that do not use HDF5.

    -

    least_significant_digit: If this or significant_digits are specified, -variable data will be truncated (quantized).
    +

    least_significant_digit: If this or significant_digits are specified, +variable data will be truncated (quantized). In conjunction with compression='zlib' this produces 'lossy', but significantly more efficient compression. For example, if least_significant_digit=1, data will be quantized using @@ -2923,7 +2943,7 @@

    In-memory (diskless) Datasets

    so that a precision of 0.1 is retained (in this case bits=4). Default is None, or no quantization.

    -

    significant_digits: New in version 1.6.0. +

    significant_digits: New in version 1.6.0. As described for least_significant_digit except the number of significant digits retained is prescribed independent of the floating point exponent. Default None - no quantization done.

    @@ -2931,7 +2951,7 @@

    In-memory (diskless) Datasets

    quantize_mode: New in version 1.6.0. Controls the quantization algorithm (default 'BitGroom', 'BitRound' and 'GranularBitRound' also available). The 'GranularBitRound' -algorithm may result in better compression for typical geophysical datasets. +algorithm may result in better compression for typical geophysical datasets. Ignored if significant_digts not specified. If 'BitRound' is used, then significant_digits is interpreted as binary (not decimal) digits.

    @@ -2942,8 +2962,8 @@

    In-memory (diskless) Datasets

    in the dictionary netCDF4.default_fillvals.

    chunk_cache: If specified, sets the chunk cache size for this variable. -Persists as long as Dataset is open. Use set_var_chunk_cache to -change it when Dataset is re-opened.

    +Persists as long as Dataset is open. Use set_var_chunk_cache to +change it when Dataset is re-opened.

    Note: Variable instances should be created using the Dataset.createVariable method of a Dataset or From bb03d0d088d7d6585c8f303932984b3a0b718cd3 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 20:38:18 -0600 Subject: [PATCH 35/92] update docstring --- docs/index.html | 14 ++++++-------- src/netCDF4/_netCDF4.pyx | 12 +++++------- 2 files changed, 11 insertions(+), 15 deletions(-) diff --git a/docs/index.html b/docs/index.html index 5304c3111..0fba2894e 100644 --- a/docs/index.html +++ b/docs/index.html @@ -1612,7 +1612,7 @@

    In-memory (diskless) Datasets

    the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

    -

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    +

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    copyright: 2008 by Jeffrey Whitaker.

    @@ -2092,9 +2092,8 @@

    In-memory (diskless) Datasets

    If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. -Currently 'zlib','zstd','bzip2','blosc_' are supported -(where can be one of lz,lz4,lz4hc,zlib,zstd,snappy). -Default is None (no compression).

    +Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc,blosc_zlib,blosc_zstd +and blosc_snappy are supported. Default is None (no compression).

    If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is @@ -2879,10 +2878,9 @@

    In-memory (diskless) Datasets

    (defined previously with createDimension). Default is an empty tuple which means the variable is a scalar (and therefore has no dimensions).

    -

    compression: compression algorithm to use. Default None. Currently -'zlib','zstd','bzip2','blosc_' are supported -(where can be one of lz,lz4,lz4hc,zlib,zstd,snappy). -Default is None (no compression).

    +

    compression: compression algorithm to use. Default None.
    +Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc,blosc_zlib,blosc_zstd +and blosc_snappy are supported. Default is None (no compression).

    zlib: if True, data assigned to the Variable instance is compressed on disk. Default False. Deprecated - use diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 5bb4bffc5..8c79020d8 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -2686,9 +2686,8 @@ is an empty tuple, which means the variable is a scalar. If the optional keyword argument `compression` is set, the data will be compressed in the netCDF file using the specified compression algorithm. -Currently 'zlib','zstd','bzip2','blosc_' are supported -(where can be one of lz,lz4,lz4hc,zlib,zstd,snappy). -Default is `None` (no compression). +Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc,blosc_zlib,blosc_zstd +and blosc_snappy are supported. Default is `None` (no compression). If the optional keyword `zlib` is `True`, the data will be compressed in the netCDF file using zlib compression (default `False`). The use of this option is @@ -3700,10 +3699,9 @@ behavior is similar to Fortran or Matlab, but different than numpy. (defined previously with `createDimension`). Default is an empty tuple which means the variable is a scalar (and therefore has no dimensions). - **`compression`**: compression algorithm to use. Default None. Currently - 'zlib','zstd','bzip2','blosc_' are supported - (where can be one of lz,lz4,lz4hc,zlib,zstd,snappy). - Default is `None` (no compression). + **`compression`**: compression algorithm to use. Default None. + Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc,blosc_zlib,blosc_zstd + and blosc_snappy are supported. Default is `None` (no compression). **`zlib`**: if `True`, data assigned to the `Variable` instance is compressed on disk. Default `False`. Deprecated - use From c2fcf145d8fc64dea02345eef3e794a50ea93663 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 24 Apr 2022 20:49:38 -0600 Subject: [PATCH 36/92] update docstrings --- docs/index.html | 20 +++++++++++--------- src/netCDF4/_netCDF4.pyx | 14 ++++++++------ 2 files changed, 19 insertions(+), 15 deletions(-) diff --git a/docs/index.html b/docs/index.html index 0fba2894e..3e00dd2f9 100644 --- a/docs/index.html +++ b/docs/index.html @@ -1612,7 +1612,7 @@

    In-memory (diskless) Datasets

    the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

    -

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    +

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    copyright: 2008 by Jeffrey Whitaker.

    @@ -2092,8 +2092,9 @@

    In-memory (diskless) Datasets

    If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. -Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc,blosc_zlib,blosc_zstd -and blosc_snappy are supported. Default is None (no compression).

    +Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, +blosc_zlib,blosc_zstd and blosc_snappy are supported. +Default is None (no compression).

    If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is @@ -2108,9 +2109,9 @@

    In-memory (diskless) Datasets

    significantly improves compression. Default is True. Ignored if zlib=False.

    -

    The optional kwargs 'blosc_shuffleandblosc_blocksizeare ignored -unless the blosc compressor is used.blosc_shufflecan be 0 (no shuffle), -1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 0.blosc_blocksize` +

    The optional kwargs blosc_shuffle and blosc_blocksize are ignored +unless the blosc compressor is used. blosc_shuffle can be 0 (no shuffle), +1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 0. blosc_blocksize is the tunable blosc blocksize in bytes (Default 0 means the blocksize is chosen internally).

    @@ -2878,9 +2879,10 @@

    In-memory (diskless) Datasets

    (defined previously with createDimension). Default is an empty tuple which means the variable is a scalar (and therefore has no dimensions).

    -

    compression: compression algorithm to use. Default None.
    -Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc,blosc_zlib,blosc_zstd -and blosc_snappy are supported. Default is None (no compression).

    +

    compression: compression algorithm to use. +Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, +blosc_zlib,blosc_zstd and blosc_snappy are supported. +Default is None (no compression).

    zlib: if True, data assigned to the Variable instance is compressed on disk. Default False. Deprecated - use diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 8c79020d8..8ca25e00d 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -2686,8 +2686,9 @@ is an empty tuple, which means the variable is a scalar. If the optional keyword argument `compression` is set, the data will be compressed in the netCDF file using the specified compression algorithm. -Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc,blosc_zlib,blosc_zstd -and blosc_snappy are supported. Default is `None` (no compression). +Currently `zlib`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, +`blosc_zlib`,`blosc_zstd` and `blosc_snappy` are supported. +Default is `None` (no compression). If the optional keyword `zlib` is `True`, the data will be compressed in the netCDF file using zlib compression (default `False`). The use of this option is @@ -2702,7 +2703,7 @@ will be applied before compressing the data (default `True`). This significantly improves compression. Default is `True`. Ignored if `zlib=False`. -The optional kwargs 'blosc_shuffle` and `blosc_blocksize` are ignored +The optional kwargs `blosc_shuffle` and `blosc_blocksize` are ignored unless the blosc compressor is used. `blosc_shuffle` can be 0 (no shuffle), 1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 0. `blosc_blocksize` is the tunable blosc blocksize in bytes (Default 0 means the blocksize is @@ -3699,9 +3700,10 @@ behavior is similar to Fortran or Matlab, but different than numpy. (defined previously with `createDimension`). Default is an empty tuple which means the variable is a scalar (and therefore has no dimensions). - **`compression`**: compression algorithm to use. Default None. - Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc,blosc_zlib,blosc_zstd - and blosc_snappy are supported. Default is `None` (no compression). + **`compression`**: compression algorithm to use. + Currently `zlib`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, + `blosc_zlib`,`blosc_zstd` and `blosc_snappy` are supported. + Default is `None` (no compression). **`zlib`**: if `True`, data assigned to the `Variable` instance is compressed on disk. Default `False`. Deprecated - use From c23fc9ed23c4d5d1666e16678c5a03251a692156 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Mon, 25 Apr 2022 06:47:30 -0600 Subject: [PATCH 37/92] remove blosc test if blosc filter not supported --- test/run_all.py | 3 +++ 1 file changed, 3 insertions(+) diff --git a/test/run_all.py b/test/run_all.py index df462c10d..447045950 100755 --- a/test/run_all.py +++ b/test/run_all.py @@ -31,6 +31,9 @@ if not __has_bzip2_support__: test_files.remove('tst_compression_bzip2.py') sys.stdout.write('not running tst_compression_bzip2.py ...\n') +if not __has_blosc_support__: + test_files.remove('tst_compression_blosc.py') + sys.stdout.write('not running tst_compression_bzip2.py ...\n') # Don't run tests that require network connectivity if os.getenv('NO_NET'): From 4ec13583b2c26c8d10facfbbf1b6b5efd767608c Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Mon, 25 Apr 2022 08:30:17 -0600 Subject: [PATCH 38/92] update docs --- docs/index.html | 14 +++++++++++--- src/netCDF4/_netCDF4.pyx | 12 ++++++++++-- 2 files changed, 21 insertions(+), 5 deletions(-) diff --git a/docs/index.html b/docs/index.html index 3e00dd2f9..4787881a1 100644 --- a/docs/index.html +++ b/docs/index.html @@ -1612,7 +1612,7 @@

    In-memory (diskless) Datasets

    the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

    -

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    +

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    copyright: 2008 by Jeffrey Whitaker.

    @@ -2094,7 +2094,11 @@

    In-memory (diskless) Datasets

    compressed in the netCDF file using the specified compression algorithm. Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib,blosc_zstd and blosc_snappy are supported. -Default is None (no compression).

    +Default is None (no compression). All of the compressors except +zlib use the HDF5 plugin architecture, which requires that the +environment variable HDF5_PLUGIN_PATH be set to the location of the +plugins built by netcdf-c (unless the plugins are installed in the +default location /usr/local/hdf5/lib).

    If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is @@ -2882,7 +2886,11 @@

    In-memory (diskless) Datasets

    compression: compression algorithm to use. Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib,blosc_zstd and blosc_snappy are supported. -Default is None (no compression).

    +Default is None (no compression). All of the compressors except +zlib use the HDF5 plugin architecture, which requires that the +environment variable HDF5_PLUGIN_PATH be set to the location of the +plugins built by netcdf-c (unless the plugins are installed in the +default location /usr/local/hdf5/lib).

    zlib: if True, data assigned to the Variable instance is compressed on disk. Default False. Deprecated - use diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 8ca25e00d..761f3aab6 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -2688,7 +2688,11 @@ If the optional keyword argument `compression` is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently `zlib`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, `blosc_zlib`,`blosc_zstd` and `blosc_snappy` are supported. -Default is `None` (no compression). +Default is `None` (no compression). All of the compressors except +`zlib` use the HDF5 plugin architecture, which requires that the +environment variable `HDF5_PLUGIN_PATH` be set to the location of the +plugins built by netcdf-c (unless the plugins are installed in the +default location `/usr/local/hdf5/lib`). If the optional keyword `zlib` is `True`, the data will be compressed in the netCDF file using zlib compression (default `False`). The use of this option is @@ -3703,7 +3707,11 @@ behavior is similar to Fortran or Matlab, but different than numpy. **`compression`**: compression algorithm to use. Currently `zlib`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, `blosc_zlib`,`blosc_zstd` and `blosc_snappy` are supported. - Default is `None` (no compression). + Default is `None` (no compression). All of the compressors except + `zlib` use the HDF5 plugin architecture, which requires that the + environment variable `HDF5_PLUGIN_PATH` be set to the location of the + plugins built by netcdf-c (unless the plugins are installed in the + default location `/usr/local/hdf5/lib`). **`zlib`**: if `True`, data assigned to the `Variable` instance is compressed on disk. Default `False`. Deprecated - use From 2b1a4bd325f834cfa5630dbd7c08b65e66c086ba Mon Sep 17 00:00:00 2001 From: jswhit Date: Mon, 25 Apr 2022 14:02:56 -0600 Subject: [PATCH 39/92] update --- README.md | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index b348c8f88..5158284a3 100644 --- a/README.md +++ b/README.md @@ -13,11 +13,9 @@ For details on the latest updates, see the [Changelog](https://github.com/Unidat ??/??/2022: Version [1.6.0](https://pypi.python.org/pypi/netCDF4/1.6.0) released. Support for quantization (bit-grooming and bit-rounding) functionality in netcdf-c 4.9.0 which can dramatically improve compression. Dataset.createVariable now accepts dimension instances (instead -of just dimension names). 'compression' kwarg added to Dataset.createVariable (in preparation for -the available of new compression algorithms, such as - [zstd](https://github.com/facebook/zstd), in netcdf-c). Currently only 'zlib' supported. -Opening a Dataset in 'append' mode now creates one if it doesn't already exist (just -like python open). Working arm64 wheels for Apple M1 Silicon now available on pypi. +of just dimension names). 'compression' kwarg added to Dataset.createVariable to support +new compression algorithms available in netcdf-c 4.9.0 through HDF5 filter plugsins (such +as zstd, bzip and blosc). Working arm64 wheels for Apple M1 Silicon now available on pypi. 10/31/2021: Version [1.5.8](https://pypi.python.org/pypi/netCDF4/1.5.8) released. Fix Enum bug, add binary wheels for aarch64 and python 3.10. From ec91dff9a815db5646f35657fd01dad40cbbbdd1 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Mon, 25 Apr 2022 20:45:58 -0600 Subject: [PATCH 40/92] update --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 5158284a3..99171e273 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ for quantization (bit-grooming and bit-rounding) functionality in netcdf-c 4.9.0 dramatically improve compression. Dataset.createVariable now accepts dimension instances (instead of just dimension names). 'compression' kwarg added to Dataset.createVariable to support new compression algorithms available in netcdf-c 4.9.0 through HDF5 filter plugsins (such -as zstd, bzip and blosc). Working arm64 wheels for Apple M1 Silicon now available on pypi. +as zstd, bzip2 and blosc). Working arm64 wheels for Apple M1 Silicon now available on pypi. 10/31/2021: Version [1.5.8](https://pypi.python.org/pypi/netCDF4/1.5.8) released. Fix Enum bug, add binary wheels for aarch64 and python 3.10. From 4dee2b8e7b7f1ae79b74f212d4df7a3abfdd69e6 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Wed, 27 Apr 2022 07:35:01 -0600 Subject: [PATCH 41/92] remove snappy support in blosc (since it's deprecated) --- src/netCDF4/_netCDF4.pyx | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 761f3aab6..1625da1a2 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -2687,7 +2687,7 @@ is an empty tuple, which means the variable is a scalar. If the optional keyword argument `compression` is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently `zlib`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, -`blosc_zlib`,`blosc_zstd` and `blosc_snappy` are supported. +`blosc_zlib` and `blosc_zstd` are supported. Default is `None` (no compression). All of the compressors except `zlib` use the HDF5 plugin architecture, which requires that the environment variable `HDF5_PLUGIN_PATH` be set to the location of the @@ -3706,7 +3706,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. **`compression`**: compression algorithm to use. Currently `zlib`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, - `blosc_zlib`,`blosc_zstd` and `blosc_snappy` are supported. + `blosc_zlib` and `blosc_zstd` are supported. Default is `None` (no compression). All of the compressors except `zlib` use the HDF5 plugin architecture, which requires that the environment variable `HDF5_PLUGIN_PATH` be set to the location of the @@ -3822,7 +3822,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. blosc_lz = False blosc_lz4 = False blosc_lz4hc = False - blosc_snappy = False + #blosc_snappy = False blosc_zlib = False blosc_zstd = False if compression == 'zlib': @@ -3837,8 +3837,8 @@ behavior is similar to Fortran or Matlab, but different than numpy. blosc_lz4 = True elif compression == 'blosc_lz4hc': blosc_lz4hc = True - elif compression == 'blosc_snappy': - blosc_snappy = True + #elif compression == 'blosc_snappy': + # blosc_snappy = True elif compression == 'blosc_zlib': blosc_zlib = True elif compression == 'blosc_zstd': @@ -4004,8 +4004,9 @@ version 4.9.0 or higher netcdf-c with zstandard support, and rebuild netcdf4-pyt compression='bzip2' only works with netcdf-c >= 4.9.0. To enable, install Cython, make sure you have version 4.9.0 or higher netcdf-c with bzip2 support, and rebuild netcdf4-python.""" raise ValueError(msg) - if blosc_lz or blosc_lz4 or blosc_lz4hc or blosc_zlib or\ - blosc_zstd or blosc_snappy: + if blosc_lz or blosc_lz4 or blosc_lz4hc or blosc_zlib or + blosc_zstd: + #blosc_zstd or blosc_snappy: IF HAS_BLOSC_SUPPORT: iblosc_compressor = _blosc_dict[compression] iblosc_shuffle = blosc_shuffle From de62007b4354801d3b5073bdd612fa5b6fd0dfcf Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Wed, 27 Apr 2022 07:46:50 -0600 Subject: [PATCH 42/92] test all blosc filters --- test/tst_compression_blosc.py | 73 ++++++++++++++++++++++------------- 1 file changed, 47 insertions(+), 26 deletions(-) diff --git a/test/tst_compression_blosc.py b/test/tst_compression_blosc.py index 4324f26c1..591a4c260 100644 --- a/test/tst_compression_blosc.py +++ b/test/tst_compression_blosc.py @@ -4,50 +4,71 @@ import os, tempfile, unittest ndim = 100000 -filename1 = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name -filename2 = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name -array = uniform(size=(ndim,)) +filename = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name +datarr = uniform(size=(ndim,)) def write_netcdf(filename,dtype='f8',complevel=6): nc = Dataset(filename,'w') nc.createDimension('n', ndim) foo = nc.createVariable('data',\ + dtype,('n'),compression=None) + foo_lz = nc.createVariable('data_lz',\ + dtype,('n'),compression='blosc_lz',blosc_shuffle=2,complevel=complevel) + foo_lz4 = nc.createVariable('data_lz4',\ dtype,('n'),compression='blosc_lz4',blosc_shuffle=2,complevel=complevel) - foo[:] = array + foo_lz4hc = nc.createVariable('data_lz4hc',\ + dtype,('n'),compression='blosc_lz4hc',blosc_shuffle=2,complevel=complevel) + foo_zlib = nc.createVariable('data_zlib',\ + dtype,('n'),compression='blosc_zlib',blosc_shuffle=2,complevel=complevel) + foo_zstd = nc.createVariable('data_zstd',\ + dtype,('n'),compression='blosc_zstd',blosc_shuffle=2,complevel=complevel) + foo_lz[:] = datarr + foo_lz4[:] = datarr + foo_lz4hc[:] = datarr + foo_zlib[:] = datarr + foo_zstd[:] = datarr nc.close() class CompressionTestCase(unittest.TestCase): def setUp(self): - self.filename1 = filename1 - self.filename2 = filename2 - write_netcdf(self.filename1,complevel=0) # no compression - write_netcdf(self.filename2,complevel=4) # with compression + self.filename = filename + write_netcdf(self.filename,complevel=4) # with compression def tearDown(self): # Remove the temporary files - os.remove(self.filename1) - os.remove(self.filename2) + os.remove(self.filename) def runTest(self): - uncompressed_size = os.stat(self.filename1).st_size - # check uncompressed data - f = Dataset(self.filename1) - size = os.stat(self.filename1).st_size - assert_almost_equal(array,f.variables['data'][:]) + f = Dataset(self.filename) + assert_almost_equal(datarr,f.variables['data'][:]) assert f.variables['data'].filters() ==\ {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} - assert_almost_equal(size,uncompressed_size) - f.close() - # check compressed data. - f = Dataset(self.filename2) - size = os.stat(self.filename2).st_size - assert_almost_equal(array,f.variables['data'][:]) - dtest= {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc':\ - {'compressor': 'blosc_lz4', 'shuffle': 2, 'blocksize': 800000},\ - 'shuffle': False, 'complevel': 4, 'fletcher32': False} - assert f.variables['data'].filters() == dtest - assert(size < 0.96*uncompressed_size) + assert_almost_equal(datarr,f.variables['data_lz'][:]) + dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': + {'compressor': 'blosc_lz', 'shuffle': 2, 'blocksize': 800000}, + 'shuffle': False, 'complevel': 4, 'fletcher32': False} + assert f.variables['data_lz'].filters() == dtest + assert_almost_equal(datarr,f.variables['data_lz4'][:]) + dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': + {'compressor': 'blosc_lz4', 'shuffle': 2, 'blocksize': 800000}, + 'shuffle': False, 'complevel': 4, 'fletcher32': False} + assert f.variables['data_lz4'].filters() == dtest + assert_almost_equal(datarr,f.variables['data_lz4hc'][:]) + dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': + {'compressor': 'blosc_lz4hc', 'shuffle': 2, 'blocksize': 800000}, + 'shuffle': False, 'complevel': 4, 'fletcher32': False} + assert f.variables['data_lz4hc'].filters() == dtest + assert_almost_equal(datarr,f.variables['data_zlib'][:]) + dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': + {'compressor': 'blosc_zlib', 'shuffle': 2, 'blocksize': 800000}, + 'shuffle': False, 'complevel': 4, 'fletcher32': False} + assert f.variables['data_zlib'].filters() == dtest + assert_almost_equal(datarr,f.variables['data_zstd'][:]) + dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': + {'compressor': 'blosc_zstd', 'shuffle': 2, 'blocksize': 800000}, + 'shuffle': False, 'complevel': 4, 'fletcher32': False} + assert f.variables['data_zstd'].filters() == dtest f.close() if __name__ == '__main__': From 313524f16e58ec1c5a9b00ea7e23d27b5452288c Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Wed, 27 Apr 2022 07:51:28 -0600 Subject: [PATCH 43/92] turn blosc shuffle on by default (as in pytables) --- src/netCDF4/_netCDF4.pyx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 1625da1a2..da328d18b 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -2645,7 +2645,7 @@ datatype.""" def createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, - blosc_shuffle=0, blosc_blocksize=0, fletcher32=False, contiguous=False, + blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None,quantize_mode='BitGroom',fill_value=None, chunk_cache=None): """ @@ -2709,7 +2709,7 @@ significantly improves compression. Default is `True`. Ignored if The optional kwargs `blosc_shuffle` and `blosc_blocksize` are ignored unless the blosc compressor is used. `blosc_shuffle` can be 0 (no shuffle), -1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 0. `blosc_blocksize` +1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 1. `blosc_blocksize` is the tunable blosc blocksize in bytes (Default 0 means the blocksize is chosen internally). @@ -3665,13 +3665,13 @@ behavior is similar to Fortran or Matlab, but different than numpy. def __init__(self, grp, name, datatype, dimensions=(), compression=None, zlib=False, - complevel=4, shuffle=True, blosc_shuffle=0, blosc_blocksize=0, + complevel=4, shuffle=True, blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None,quantize_mode='BitGroom',fill_value=None, chunk_cache=None, **kwargs): """ **`__init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, - complevel=4, shuffle=True, blosc_shuffle=0, blosc_blocksize=0, fletcher32=False, contiguous=False, + complevel=4, shuffle=True, blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None,chunk_cache=None)`** @@ -3727,7 +3727,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. **`blosc_shuffle`**: shuffle filter inside blosc compressor (only relevant if compression kwarg set to one of the blosc compressors). Can be 0 (no blosc shuffle), 1 (bytewise shuffle) or 2 (bitwise - shuffle)). Default is 0. + shuffle)). Default is 1. **`blosc_blocksize`**: tunable blocksize in bytes for blosc compressors. Default of 0 means blosc library chooses a blocksize. From c29d9c23471e08505ea61338bfbb9b9c04d8db94 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Wed, 27 Apr 2022 08:10:17 -0600 Subject: [PATCH 44/92] update --- src/netCDF4/_netCDF4.pyx | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index da328d18b..6b8744741 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -4004,9 +4004,7 @@ version 4.9.0 or higher netcdf-c with zstandard support, and rebuild netcdf4-pyt compression='bzip2' only works with netcdf-c >= 4.9.0. To enable, install Cython, make sure you have version 4.9.0 or higher netcdf-c with bzip2 support, and rebuild netcdf4-python.""" raise ValueError(msg) - if blosc_lz or blosc_lz4 or blosc_lz4hc or blosc_zlib or - blosc_zstd: - #blosc_zstd or blosc_snappy: + if blosc_zstd or blosc_lz or blosc_lz4 or blosc_lz4hc or blosc_zlib: IF HAS_BLOSC_SUPPORT: iblosc_compressor = _blosc_dict[compression] iblosc_shuffle = blosc_shuffle From dbd6450761d5fd64f0f8169af3f9e2082f2874bc Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Wed, 27 Apr 2022 09:15:53 -0600 Subject: [PATCH 45/92] update --- Changelog | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Changelog b/Changelog index bbb3dfe03..e8272c510 100644 --- a/Changelog +++ b/Changelog @@ -13,8 +13,8 @@ directive to 3 in setup.py. * add 'compression' kwarg to createVariable to enable new compression functionality in netcdf-c 4.9.0. 'None','zlib','zstd','bzip2' - 'blosc_lz','blosc_lz4','blosc_lz4hc','blosc_zlib','blosc_zstd' and - 'blosc_snappy' are currently supported. 'blosc_shuffle' and + 'blosc_lz','blosc_lz4','blosc_lz4hc','blosc_zlib' and 'blosc_zstd' + are currently supported. 'blosc_shuffle' and 'blosc_blocksize' kwargs also added. compression='zlib' is equivalent to (the now deprecated) zlib=True. * MFDataset did not aggregate 'name' variable attribute (issue #1153). From f896a2a3cf099710094c4f7e35c0907e8dbd66be Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Wed, 27 Apr 2022 09:18:02 -0600 Subject: [PATCH 46/92] update docs --- docs/index.html | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/index.html b/docs/index.html index 4787881a1..cf86fbfa2 100644 --- a/docs/index.html +++ b/docs/index.html @@ -1612,7 +1612,7 @@

    In-memory (diskless) Datasets

    the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

    -

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    +

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    copyright: 2008 by Jeffrey Whitaker.

    @@ -2093,7 +2093,7 @@

    In-memory (diskless) Datasets

    If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, -blosc_zlib,blosc_zstd and blosc_snappy are supported. +blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except zlib use the HDF5 plugin architecture, which requires that the environment variable HDF5_PLUGIN_PATH be set to the location of the @@ -2115,7 +2115,7 @@

    In-memory (diskless) Datasets

    The optional kwargs blosc_shuffle and blosc_blocksize are ignored unless the blosc compressor is used. blosc_shuffle can be 0 (no shuffle), -1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 0. blosc_blocksize +1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 1. blosc_blocksize is the tunable blosc blocksize in bytes (Default 0 means the blocksize is chosen internally).

    @@ -2850,7 +2850,7 @@

    In-memory (diskless) Datasets

    __init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, -complevel=4, shuffle=True, blosc_shuffle=0, blosc_blocksize=0, fletcher32=False, contiguous=False, +complevel=4, shuffle=True, blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None,chunk_cache=None)

    @@ -2885,7 +2885,7 @@

    In-memory (diskless) Datasets

    compression: compression algorithm to use. Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, -blosc_zlib,blosc_zstd and blosc_snappy are supported. +blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except zlib use the HDF5 plugin architecture, which requires that the environment variable HDF5_PLUGIN_PATH be set to the location of the @@ -2906,7 +2906,7 @@

    In-memory (diskless) Datasets

    blosc_shuffle: shuffle filter inside blosc compressor (only relevant if compression kwarg set to one of the blosc compressors). Can be 0 (no blosc shuffle), 1 (bytewise shuffle) or 2 (bitwise -shuffle)). Default is 0.

    +shuffle)). Default is 1.

    blosc_blocksize: tunable blocksize in bytes for blosc compressors. Default of 0 means blosc library chooses a blocksize.

    From f4ec3f265c905e230492b503024056b54325df8a Mon Sep 17 00:00:00 2001 From: jswhit Date: Wed, 27 Apr 2022 11:47:26 -0600 Subject: [PATCH 47/92] update docstrings --- src/netCDF4/_netCDF4.pyx | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 6b8744741..abe44b7b8 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -2055,7 +2055,7 @@ strings. **`mode`**: access mode. `r` means read-only; no data can be modified. `w` means write; a new file is created, an existing file with - the same name is deleted. 'x' means write, but fail if an existing + the same name is deleted. `x` means write, but fail if an existing file with the same name already exists. `a` and `r+` mean append; an existing file is opened for reading and writing, if file does not exist already, one is created. @@ -2070,7 +2070,7 @@ strings. **`clobber`**: if `True` (default), opening a file with `mode='w'` will clobber an existing file with the same name. if `False`, an exception will be raised if a file with the same name already exists. - mode='x' is identical to mode='w' with clobber=False. + mode=`x` is identical to mode=`w` with clobber=False. **`format`**: underlying file format (one of `'NETCDF4', 'NETCDF4_CLASSIC', 'NETCDF3_CLASSIC'`, `'NETCDF3_64BIT_OFFSET'` or @@ -2113,14 +2113,14 @@ strings. rendered unusable when the parent Dataset instance is garbage collected. **`memory`**: if not `None`, create or open an in-memory Dataset. - If mode = 'r', the memory kwarg must contain a memory buffer object + If mode = `r`, the memory kwarg must contain a memory buffer object (an object that supports the python buffer interface). The Dataset will then be created with contents taken from this block of memory. - If mode = 'w', the memory kwarg should contain the anticipated size + If mode = `w`, the memory kwarg should contain the anticipated size of the Dataset in bytes (used only for NETCDF3 files). A memory buffer containing a copy of the Dataset is returned by the - `Dataset.close` method. Requires netcdf-c version 4.4.1 for mode='r, - netcdf-c 4.6.2 for mode='w'. To persist the file to disk, the raw + `Dataset.close` method. Requires netcdf-c version 4.4.1 for mode=`r` + netcdf-c 4.6.2 for mode=`w`. To persist the file to disk, the raw bytes from the returned buffer can be written into a binary file. The Dataset can also be re-opened using this memory buffer. @@ -2334,7 +2334,7 @@ strings. else: ierr = nc_create(path, NC_SHARE | NC_NOCLOBBER, &grpid) else: - raise ValueError("mode must be 'w', 'r', 'a' or 'r+', got '%s'" % mode) + raise ValueError("mode must be 'w', 'x', 'r', 'a' or 'r+', got '%s'" % mode) _ensure_nc_success(ierr, err_cls=IOError, filename=path) From 7c6893add230254bd52de6b6efa5e464d64be3a2 Mon Sep 17 00:00:00 2001 From: jswhit Date: Fri, 29 Apr 2022 12:27:59 -0600 Subject: [PATCH 48/92] dont fail in filter method if plugin not found --- src/netCDF4/_netCDF4.pyx | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index abe44b7b8..145e36a7c 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -4419,15 +4419,18 @@ return dictionary containing HDF5 filter parameters.""" IF HAS_ZSTANDARD_SUPPORT: ierr = nc_inq_var_zstandard(self._grpid, self._varid, &izstd,\ &icomplevel_zstd) - _ensure_nc_success(ierr) + if ierr != 0: izstd=0 + # _ensure_nc_success(ierr) IF HAS_BZIP2_SUPPORT: ierr = nc_inq_var_bzip2(self._grpid, self._varid, &ibzip2,\ &icomplevel_bzip2) - _ensure_nc_success(ierr) + if ierr != 0: ibzip2=0 + #_ensure_nc_success(ierr) IF HAS_BLOSC_SUPPORT: ierr = nc_inq_var_blosc(self._grpid, self._varid, &iblosc,\ &iblosc_compressor,&iblosc_complevel,&iblosc_blocksize,&iblosc_shuffle) - _ensure_nc_success(ierr) + if ierr != 0: iblosc=0 + #_ensure_nc_success(ierr) if ideflate: filtdict['zlib']=True filtdict['complevel']=icomplevel @@ -4438,7 +4441,6 @@ return dictionary containing HDF5 filter parameters.""" filtdict['bzip2']=True filtdict['complevel']=icomplevel_bzip2 if iblosc: - #filtdict['blosc']=True blosc_compressor = iblosc_compressor filtdict['blosc']={'compressor':_blosc_dict_inv[blosc_compressor],'shuffle':iblosc_shuffle,'blocksize':iblosc_blocksize} filtdict['complevel']=iblosc_complevel From 534fc0ef0ad7011ddd26e5b3daef3dc49281a331 Mon Sep 17 00:00:00 2001 From: jswhit Date: Fri, 29 Apr 2022 13:18:11 -0600 Subject: [PATCH 49/92] update --- Changelog | 2 ++ 1 file changed, 2 insertions(+) diff --git a/Changelog b/Changelog index e8272c510..2128d23bd 100644 --- a/Changelog +++ b/Changelog @@ -17,6 +17,8 @@ are currently supported. 'blosc_shuffle' and 'blosc_blocksize' kwargs also added. compression='zlib' is equivalent to (the now deprecated) zlib=True. + Using new compressors requires setting HDF5_PLUGIN_PATH to point to + the installation location of the netcdf-c filter plugins. * MFDataset did not aggregate 'name' variable attribute (issue #1153). * issue warning instead of raising an exception if missing_value or _FillValue can't be cast to the variable type when creating a From 97a70f802b1ee72e52cd369a6501080bd0225ed8 Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 3 May 2022 13:32:36 -0700 Subject: [PATCH 50/92] add szip compression support --- include/netCDF4.pxi | 9 +++++-- setup.py | 14 +++++++++-- src/netCDF4/_netCDF4.pyx | 48 ++++++++++++++++++++++++++++++------ test/run_all.py | 6 ++++- test/tst_compression_szip.py | 45 +++++++++++++++++++++++++++++++++ 5 files changed, 109 insertions(+), 13 deletions(-) create mode 100644 test/tst_compression_szip.py diff --git a/include/netCDF4.pxi b/include/netCDF4.pxi index a5ff730be..8be538234 100644 --- a/include/netCDF4.pxi +++ b/include/netCDF4.pxi @@ -216,8 +216,6 @@ cdef extern from "netcdf.h": NC_ENDIAN_NATIVE NC_ENDIAN_LITTLE NC_ENDIAN_BIG - NC_SZIP_EC_OPTION_MASK # entropy encoding - NC_SZIP_NN_OPTION_MASK # nearest neighbor encoding const_char_ptr *nc_inq_libvers() nogil const_char_ptr *nc_strerror(int ncerr) int nc_create(char *path, int cmode, int *ncidp) @@ -701,6 +699,13 @@ IF HAS_QUANTIZATION_SUPPORT: int nc_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd) int nc_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp) nogil +IF HAS_SZIP_SUPPORT: + cdef extern from "netcdf.h": + int nc_def_var_quantize(int ncid, int varid, int quantize_mode, int nsd) + int nc_inq_var_quantize(int ncid, int varid, int *quantize_modep, int *nsdp) nogil + int nc_def_var_szip(int ncid, int varid, int options_mask, int pixels_per_bloc) + int nc_inq_var_szip(int ncid, int varid, int *options_maskp, int *pixels_per_blockp) + IF HAS_ZSTANDARD_SUPPORT: cdef extern from "netcdf_filter.h": cdef enum: diff --git a/setup.py b/setup.py index 5213af649..9cf48528e 100644 --- a/setup.py +++ b/setup.py @@ -65,6 +65,7 @@ def check_api(inc_dirs,netcdf_lib_version): has_parallel_support = False has_parallel4_support = False has_pnetcdf_support = False + has_szip_support = False has_quantize = False has_zstandard = False has_bzip2 = False @@ -124,6 +125,8 @@ def check_api(inc_dirs,netcdf_lib_version): has_parallel4_support = bool(int(line.split()[2])) if line.startswith('#define NC_HAS_PNETCDF'): has_pnetcdf_support = bool(int(line.split()[2])) + if line.startswith('#define NC_HAS_SZIP_WRITE'): + has_szip_support = bool(int(line.split()[2])) # NC_HAS_PARALLEL4 missing in 4.6.1 (issue #964) if not has_parallel4_support and has_parallel_support and not has_pnetcdf_support: has_parallel4_support = True @@ -136,7 +139,7 @@ def check_api(inc_dirs,netcdf_lib_version): return has_rename_grp, has_nc_inq_path, has_nc_inq_format_extended, \ has_cdf5_format, has_nc_open_mem, has_nc_create_mem, \ - has_parallel4_support, has_pnetcdf_support, has_quantize, \ + has_parallel4_support, has_pnetcdf_support, has_szip_support, has_quantize, \ has_zstandard, has_bzip2, has_blosc @@ -550,7 +553,7 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): # this determines whether renameGroup and filepath methods will work. has_rename_grp, has_nc_inq_path, has_nc_inq_format_extended, \ has_cdf5_format, has_nc_open_mem, has_nc_create_mem, \ - has_parallel4_support, has_pnetcdf_support, has_quantize, \ + has_parallel4_support, has_pnetcdf_support, has_szip_support, has_quantize, \ has_zstandard, has_bzip2, has_blosc = \ check_api(inc_dirs,netcdf_lib_version) # for netcdf 4.4.x CDF5 format is always enabled. @@ -651,6 +654,13 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): sys.stdout.write('netcdf lib does not have blosc compression functions\n') f.write('DEF HAS_BLOSC_SUPPORT = 0\n') + if has_szip_support: + sys.stdout.write('netcdf lib has szip compression functions\n') + f.write('DEF HAS_SZIP_SUPPORT = 1\n') + else: + sys.stdout.write('netcdf lib does not have szip compression functions\n') + f.write('DEF HAS_SZIP_SUPPORT = 0\n') + f.close() if has_parallel4_support or has_pnetcdf_support: diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 145e36a7c..415bdcd2d 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -1372,6 +1372,8 @@ _cmode_dict = {'NETCDF3_CLASSIC' : NC_CLASSIC_MODEL, # dicts for blosc compressors. _blosc_dict={'blosc_lz':0,'blosc_lz4':1,'blosc_lz4hc':2,'blosc_snappy':3,'blosc_zlib':4,'blosc_zstd':5} _blosc_dict_inv = {v: k for k, v in _blosc_dict.items()} +_szip_dict = {'ec': 4, 'nn': 32} +_szip_dict_inv = {v: k for k, v in _szip_dict.items()} IF HAS_CDF5_FORMAT: # NETCDF3_64BIT deprecated, saved for compatibility. # use NETCDF3_64BIT_OFFSET instead. @@ -2645,12 +2647,14 @@ datatype.""" def createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, + szip_mask='nn',szip_pixels_per_block=8, blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None,quantize_mode='BitGroom',fill_value=None, chunk_cache=None): """ **`createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, +szip_mask='nn', szip_pixels_per_block=8, blosc_shuffle=1, blosc_blocksize=0, endian='native', least_significant_digit=None, significant_digits=None, quantize_mode='BitGroom', fill_value=None, chunk_cache=None)`** @@ -2686,10 +2690,10 @@ is an empty tuple, which means the variable is a scalar. If the optional keyword argument `compression` is set, the data will be compressed in the netCDF file using the specified compression algorithm. -Currently `zlib`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, +Currently `zlib`,`szip`, `zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, `blosc_zlib` and `blosc_zstd` are supported. Default is `None` (no compression). All of the compressors except -`zlib` use the HDF5 plugin architecture, which requires that the +`zlib` and `szip` use the HDF5 plugin architecture, which requires that the environment variable `HDF5_PLUGIN_PATH` be set to the location of the plugins built by netcdf-c (unless the plugins are installed in the default location `/usr/local/hdf5/lib`). @@ -2703,7 +2707,7 @@ the level of compression desired (default 4). Ignored if `compression=None`. A value of zero disables compression. If the optional keyword `shuffle` is `True`, the HDF5 shuffle filter -will be applied before compressing the data (default `True`). This +will be applied before compressing the data with zlib (default `True`). This significantly improves compression. Default is `True`. Ignored if `zlib=False`. @@ -2824,6 +2828,7 @@ is the number of variable dimensions.""" # create variable. group.variables[varname] = Variable(group, varname, datatype, dimensions=dimensions, compression=compression, zlib=zlib, complevel=complevel, shuffle=shuffle, + szip_mask=szip_mask, szip_pixels_per_block=szip_pixels_per_block, blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize, fletcher32=fletcher32, contiguous=contiguous, chunksizes=chunksizes, endian=endian, least_significant_digit=least_significant_digit, @@ -3665,13 +3670,15 @@ behavior is similar to Fortran or Matlab, but different than numpy. def __init__(self, grp, name, datatype, dimensions=(), compression=None, zlib=False, - complevel=4, shuffle=True, blosc_shuffle=1, blosc_blocksize=0, + complevel=4, shuffle=True, szip_mask='nn', szip_pixels_per_block=8, + blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None,quantize_mode='BitGroom',fill_value=None, chunk_cache=None, **kwargs): """ **`__init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, - complevel=4, shuffle=True, blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, + complevel=4, shuffle=True, szip_mask='nn', szip_pixels_per_block=8, + blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None,chunk_cache=None)`** @@ -3705,7 +3712,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. which means the variable is a scalar (and therefore has no dimensions). **`compression`**: compression algorithm to use. - Currently `zlib`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, + Currently `zlib`,`szip`, `zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, `blosc_zlib` and `blosc_zstd` are supported. Default is `None` (no compression). All of the compressors except `zlib` use the HDF5 plugin architecture, which requires that the @@ -3722,7 +3729,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. Ignored if `compression=None`. A value of 0 disables compression. **`shuffle`**: if `True`, the HDF5 shuffle filter is applied - to improve compression. Default `True`. Ignored if `compression=None`. + to improve zlib compression. Default `True`. Ignored unless `compression = 'zlib'`. **`blosc_shuffle`**: shuffle filter inside blosc compressor (only relevant if compression kwarg set to one of the blosc compressors). @@ -3800,6 +3807,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. """ cdef int ierr, ndims, icontiguous, icomplevel, numdims, _grpid, nsd, cdef unsigned int iblosc_complevel,iblosc_blocksize,iblosc_compressor,iblosc_shuffle + cdef int iszip_mask, iszip_pixels_per_block cdef char namstring[NC_MAX_NAME+1] cdef char *varname cdef nc_type xtype @@ -3817,6 +3825,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. if not complevel: compression = None zlib = False + szip = False zstd = False bzip2 = False blosc_lz = False @@ -3827,6 +3836,8 @@ behavior is similar to Fortran or Matlab, but different than numpy. blosc_zstd = False if compression == 'zlib': zlib = True + elif compression == 'szip': + szip = True elif compression == 'zstd': zstd = True elif compression == 'bzip2': @@ -3980,6 +3991,18 @@ behavior is similar to Fortran or Matlab, but different than numpy. if ierr != NC_NOERR: if grp.data_model != 'NETCDF4': grp._enddef() _ensure_nc_success(ierr) + if szip: + IF HAS_SZIP_SUPPORT: + iszip_mask = _szip_dict[szip_mask] + iszip_pixels_per_block = szip_pixels_per_block + ierr = nc_def_var_szip(self._grpid, self._varid, iszip_mask, iszip_pixels_per_block) + if ierr != NC_NOERR: + if grp.data_model != 'NETCDF4': grp._enddef() + _ensure_nc_success(ierr) + ELSE: + msg = """ +compression='szip' only works if linked version of hdf5 has szip functionality enabled""" + raise ValueError(msg) if zstd: IF HAS_ZSTANDARD_SUPPORT: icomplevel = complevel @@ -4407,8 +4430,10 @@ return dictionary containing HDF5 filter parameters.""" cdef int izstd=0 cdef int ibzip2=0 cdef int iblosc=0 + cdef int iszip=0 cdef unsigned int iblosc_complevel,iblosc_blocksize,iblosc_compressor,iblosc_shuffle - filtdict = {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} + cdef int iszip_mask, iszip_pixels_per_block + filtdict = {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} if self._grp.data_model not in ['NETCDF4_CLASSIC','NETCDF4']: return with nogil: ierr = nc_inq_var_deflate(self._grpid, self._varid, &ishuffle, &ideflate, &icomplevel) @@ -4431,6 +4456,11 @@ return dictionary containing HDF5 filter parameters.""" &iblosc_compressor,&iblosc_complevel,&iblosc_blocksize,&iblosc_shuffle) if ierr != 0: iblosc=0 #_ensure_nc_success(ierr) + IF HAS_SZIP_SUPPORT: + ierr = nc_inq_var_szip(self._grpid, self._varid, &iszip_mask,\ + &iszip_pixels_per_block) + if ierr != 0: iszip=0 + #_ensure_nc_success(ierr) if ideflate: filtdict['zlib']=True filtdict['complevel']=icomplevel @@ -4444,6 +4474,8 @@ return dictionary containing HDF5 filter parameters.""" blosc_compressor = iblosc_compressor filtdict['blosc']={'compressor':_blosc_dict_inv[blosc_compressor],'shuffle':iblosc_shuffle,'blocksize':iblosc_blocksize} filtdict['complevel']=iblosc_complevel + if iszip: + filtdict['szip']={'mask':_szip_dict_inv[iszip_mask],'pixels_per_block':iszip_pixels_per_block} if ishuffle: filtdict['shuffle']=True if ifletcher32: diff --git a/test/run_all.py b/test/run_all.py index 447045950..a76764d97 100755 --- a/test/run_all.py +++ b/test/run_all.py @@ -3,7 +3,8 @@ from netCDF4 import __has_cdf5_format__, __has_nc_inq_path__, __has_nc_create_mem__, \ __has_parallel4_support__, __has_pnetcdf_support__, \ __has_zstandard_support__, __has_bzip2_support__, \ - __has_blosc_support__,__has_quantization_support__ + __has_blosc_support__,__has_quantization_support__,\ + __has_szip_support__ # can also just run # python -m unittest discover . 'tst*py' @@ -34,6 +35,9 @@ if not __has_blosc_support__: test_files.remove('tst_compression_blosc.py') sys.stdout.write('not running tst_compression_bzip2.py ...\n') +if not __has_szip_support__: + test_files.remove('tst_compression_szip.py') + sys.stdout.write('not running tst_compression_szip.py ...\n') # Don't run tests that require network connectivity if os.getenv('NO_NET'): diff --git a/test/tst_compression_szip.py b/test/tst_compression_szip.py new file mode 100644 index 000000000..a09cffbba --- /dev/null +++ b/test/tst_compression_szip.py @@ -0,0 +1,45 @@ +from numpy.random.mtrand import uniform +from netCDF4 import Dataset +from numpy.testing import assert_almost_equal +import os, tempfile, unittest + +ndim = 100000 +filename = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name +datarr = uniform(size=(ndim,)) + +def write_netcdf(filename,dtype='f8'): + nc = Dataset(filename,'w') + nc.createDimension('n', ndim) + foo = nc.createVariable('data',\ + dtype,('n'),compression=None) + foo_szip = nc.createVariable('data_szip',\ + dtype,('n'),compression='szip',szip_mask='ec',szip_pixels_per_block=32) + foo[:] = datarr + foo_szip[:] = datarr + nc.close() + +class CompressionTestCase(unittest.TestCase): + + def setUp(self): + self.filename = filename + write_netcdf(self.filename) + + def tearDown(self): + # Remove the temporary files + os.remove(self.filename) + + def runTest(self): + f = Dataset(self.filename) + assert_almost_equal(datarr,f.variables['data'][:]) + assert f.variables['data'].filters() ==\ + {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} + assert_almost_equal(datarr,f.variables['data_szip'][:]) + dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'szip': + {'mask': 'ec', 'pixels_per_block': 32}, + 'shuffle': False, 'complevel': 4, 'fletcher32': False} + print(f.variables['data_szip'].filters()) + #assert f.variables['data_szip'].filters() == dtest + f.close() + +if __name__ == '__main__': + unittest.main() From 9647a7df07112750d26f1d18481359fa461a4e18 Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 3 May 2022 13:33:48 -0700 Subject: [PATCH 51/92] add szip support --- src/netCDF4/__init__.py | 2 +- src/netCDF4/_netCDF4.pyx | 1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/src/netCDF4/__init__.py b/src/netCDF4/__init__.py index b79cf6323..f518b128e 100644 --- a/src/netCDF4/__init__.py +++ b/src/netCDF4/__init__.py @@ -9,6 +9,6 @@ __has_nc_create_mem__, __has_cdf5_format__, __has_parallel4_support__, __has_pnetcdf_support__, __has_quantization_support__, __has_zstandard_support__, - __has_bzip2_support__, __has_blosc_support__) + __has_bzip2_support__, __has_blosc_support__, __has_szip_support__) __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 415bdcd2d..8ac8dbd61 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -1324,6 +1324,7 @@ __has_quantization_support__ = HAS_QUANTIZATION_SUPPORT __has_zstandard_support__ = HAS_ZSTANDARD_SUPPORT __has_bzip2_support__ = HAS_BZIP2_SUPPORT __has_blosc_support__ = HAS_BLOSC_SUPPORT +__has_szip_support__ = HAS_SZIP_SUPPORT _needsworkaround_issue485 = __netcdf4libversion__ < "4.4.0" or \ (__netcdf4libversion__.startswith("4.4.0") and \ "-development" in __netcdf4libversion__) From 16c220661211658512bc7e3461bbd492d953fef2 Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 3 May 2022 13:47:20 -0700 Subject: [PATCH 52/92] update --- src/netCDF4/_netCDF4.pyx | 24 ++++++++++++------------ test/tst_compression.py | 18 +++++++++--------- test/tst_compression_blosc.py | 12 ++++++------ test/tst_compression_bzip2.py | 4 ++-- test/tst_compression_szip.py | 4 ++-- test/tst_compression_zstd.py | 4 ++-- 6 files changed, 33 insertions(+), 33 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 8ac8dbd61..938514c15 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -1370,7 +1370,7 @@ _format_dict = {'NETCDF3_CLASSIC' : NC_FORMAT_CLASSIC, _cmode_dict = {'NETCDF3_CLASSIC' : NC_CLASSIC_MODEL, 'NETCDF4_CLASSIC' : NC_CLASSIC_MODEL | NC_NETCDF4, 'NETCDF4' : NC_NETCDF4} -# dicts for blosc compressors. +# dicts for blosc, szip compressors. _blosc_dict={'blosc_lz':0,'blosc_lz4':1,'blosc_lz4hc':2,'blosc_snappy':3,'blosc_zlib':4,'blosc_zstd':5} _blosc_dict_inv = {v: k for k, v in _blosc_dict.items()} _szip_dict = {'ec': 4, 'nn': 32} @@ -2648,14 +2648,14 @@ datatype.""" def createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, - szip_mask='nn',szip_pixels_per_block=8, + szip_coding='nn',szip_pixels_per_block=8, blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None,quantize_mode='BitGroom',fill_value=None, chunk_cache=None): """ **`createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, -szip_mask='nn', szip_pixels_per_block=8, blosc_shuffle=1, blosc_blocksize=0, +szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, blosc_blocksize=0, endian='native', least_significant_digit=None, significant_digits=None, quantize_mode='BitGroom', fill_value=None, chunk_cache=None)`** @@ -2829,7 +2829,7 @@ is the number of variable dimensions.""" # create variable. group.variables[varname] = Variable(group, varname, datatype, dimensions=dimensions, compression=compression, zlib=zlib, complevel=complevel, shuffle=shuffle, - szip_mask=szip_mask, szip_pixels_per_block=szip_pixels_per_block, + szip_coding=szip_coding, szip_pixels_per_block=szip_pixels_per_block, blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize, fletcher32=fletcher32, contiguous=contiguous, chunksizes=chunksizes, endian=endian, least_significant_digit=least_significant_digit, @@ -3671,14 +3671,14 @@ behavior is similar to Fortran or Matlab, but different than numpy. def __init__(self, grp, name, datatype, dimensions=(), compression=None, zlib=False, - complevel=4, shuffle=True, szip_mask='nn', szip_pixels_per_block=8, + complevel=4, shuffle=True, szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None,quantize_mode='BitGroom',fill_value=None, chunk_cache=None, **kwargs): """ **`__init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, - complevel=4, shuffle=True, szip_mask='nn', szip_pixels_per_block=8, + complevel=4, shuffle=True, szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None,chunk_cache=None)`** @@ -3808,7 +3808,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. """ cdef int ierr, ndims, icontiguous, icomplevel, numdims, _grpid, nsd, cdef unsigned int iblosc_complevel,iblosc_blocksize,iblosc_compressor,iblosc_shuffle - cdef int iszip_mask, iszip_pixels_per_block + cdef int iszip_coding, iszip_pixels_per_block cdef char namstring[NC_MAX_NAME+1] cdef char *varname cdef nc_type xtype @@ -3994,9 +3994,9 @@ behavior is similar to Fortran or Matlab, but different than numpy. _ensure_nc_success(ierr) if szip: IF HAS_SZIP_SUPPORT: - iszip_mask = _szip_dict[szip_mask] + iszip_coding = _szip_dict[szip_coding] iszip_pixels_per_block = szip_pixels_per_block - ierr = nc_def_var_szip(self._grpid, self._varid, iszip_mask, iszip_pixels_per_block) + ierr = nc_def_var_szip(self._grpid, self._varid, iszip_coding, iszip_pixels_per_block) if ierr != NC_NOERR: if grp.data_model != 'NETCDF4': grp._enddef() _ensure_nc_success(ierr) @@ -4433,7 +4433,7 @@ return dictionary containing HDF5 filter parameters.""" cdef int iblosc=0 cdef int iszip=0 cdef unsigned int iblosc_complevel,iblosc_blocksize,iblosc_compressor,iblosc_shuffle - cdef int iszip_mask, iszip_pixels_per_block + cdef int iszip_coding, iszip_pixels_per_block filtdict = {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} if self._grp.data_model not in ['NETCDF4_CLASSIC','NETCDF4']: return with nogil: @@ -4458,7 +4458,7 @@ return dictionary containing HDF5 filter parameters.""" if ierr != 0: iblosc=0 #_ensure_nc_success(ierr) IF HAS_SZIP_SUPPORT: - ierr = nc_inq_var_szip(self._grpid, self._varid, &iszip_mask,\ + ierr = nc_inq_var_szip(self._grpid, self._varid, &iszip_coding,\ &iszip_pixels_per_block) if ierr != 0: iszip=0 #_ensure_nc_success(ierr) @@ -4476,7 +4476,7 @@ return dictionary containing HDF5 filter parameters.""" filtdict['blosc']={'compressor':_blosc_dict_inv[blosc_compressor],'shuffle':iblosc_shuffle,'blocksize':iblosc_blocksize} filtdict['complevel']=iblosc_complevel if iszip: - filtdict['szip']={'mask':_szip_dict_inv[iszip_mask],'pixels_per_block':iszip_pixels_per_block} + filtdict['szip']={'coding':_szip_dict_inv[iszip_coding],'pixels_per_block':iszip_pixels_per_block} if ishuffle: filtdict['shuffle']=True if ifletcher32: diff --git a/test/tst_compression.py b/test/tst_compression.py index c47422910..f28ed0b96 100644 --- a/test/tst_compression.py +++ b/test/tst_compression.py @@ -89,9 +89,9 @@ def runTest(self): assert_almost_equal(array,f.variables['data'][:]) assert_almost_equal(array,f.variables['data2'][:]) assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} + {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert f.variables['data2'].filters() ==\ - {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} + {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(size,uncompressed_size) f.close() # check compressed data. @@ -100,9 +100,9 @@ def runTest(self): assert_almost_equal(array,f.variables['data'][:]) assert_almost_equal(array,f.variables['data2'][:]) assert f.variables['data'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':6,'fletcher32':False} + {'zlib':True,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':6,'fletcher32':False} assert f.variables['data2'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':6,'fletcher32':False} + {'zlib':True,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':6,'fletcher32':False} assert(size < 0.95*uncompressed_size) f.close() # check compression with shuffle @@ -111,9 +111,9 @@ def runTest(self): assert_almost_equal(array,f.variables['data'][:]) assert_almost_equal(array,f.variables['data2'][:]) assert f.variables['data'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':False} + {'zlib':True,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':False} assert f.variables['data2'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':False} + {'zlib':True,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':False} assert(size < 0.85*uncompressed_size) f.close() # check lossy compression without shuffle @@ -138,9 +138,9 @@ def runTest(self): assert_almost_equal(checkarray,f.variables['data'][:]) assert_almost_equal(checkarray,f.variables['data2'][:]) assert f.variables['data'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':True} + {'zlib':True,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':True} assert f.variables['data2'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':True} + {'zlib':True,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':True} assert(size < 0.20*uncompressed_size) # should be slightly larger than without fletcher32 assert(size > size_save) @@ -150,7 +150,7 @@ def runTest(self): checkarray2 = _quantize(array2,lsd) assert_almost_equal(checkarray2,f.variables['data2'][:]) assert f.variables['data2'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':True} + {'zlib':True,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':6,'fletcher32':True} assert f.variables['data2'].chunking() == [chunk1,chunk2] f.close() diff --git a/test/tst_compression_blosc.py b/test/tst_compression_blosc.py index 591a4c260..66ca0e9e4 100644 --- a/test/tst_compression_blosc.py +++ b/test/tst_compression_blosc.py @@ -43,29 +43,29 @@ def runTest(self): f = Dataset(self.filename) assert_almost_equal(datarr,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} + {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(datarr,f.variables['data_lz'][:]) - dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': + dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': {'compressor': 'blosc_lz', 'shuffle': 2, 'blocksize': 800000}, 'shuffle': False, 'complevel': 4, 'fletcher32': False} assert f.variables['data_lz'].filters() == dtest assert_almost_equal(datarr,f.variables['data_lz4'][:]) - dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': + dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': {'compressor': 'blosc_lz4', 'shuffle': 2, 'blocksize': 800000}, 'shuffle': False, 'complevel': 4, 'fletcher32': False} assert f.variables['data_lz4'].filters() == dtest assert_almost_equal(datarr,f.variables['data_lz4hc'][:]) - dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': + dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': {'compressor': 'blosc_lz4hc', 'shuffle': 2, 'blocksize': 800000}, 'shuffle': False, 'complevel': 4, 'fletcher32': False} assert f.variables['data_lz4hc'].filters() == dtest assert_almost_equal(datarr,f.variables['data_zlib'][:]) - dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': + dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': {'compressor': 'blosc_zlib', 'shuffle': 2, 'blocksize': 800000}, 'shuffle': False, 'complevel': 4, 'fletcher32': False} assert f.variables['data_zlib'].filters() == dtest assert_almost_equal(datarr,f.variables['data_zstd'][:]) - dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': + dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': {'compressor': 'blosc_zstd', 'shuffle': 2, 'blocksize': 800000}, 'shuffle': False, 'complevel': 4, 'fletcher32': False} assert f.variables['data_zstd'].filters() == dtest diff --git a/test/tst_compression_bzip2.py b/test/tst_compression_bzip2.py index 8a162f20c..89de4086c 100644 --- a/test/tst_compression_bzip2.py +++ b/test/tst_compression_bzip2.py @@ -36,7 +36,7 @@ def runTest(self): size = os.stat(self.filename1).st_size assert_almost_equal(array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} + {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(size,uncompressed_size) f.close() # check compressed data. @@ -44,7 +44,7 @@ def runTest(self): size = os.stat(self.filename2).st_size assert_almost_equal(array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':False,'bzip2':True,'blosc':False,'shuffle':False,'complevel':4,'fletcher32':False} + {'zlib':False,'szip':False,'zstd':False,'bzip2':True,'blosc':False,'shuffle':False,'complevel':4,'fletcher32':False} assert(size < 0.96*uncompressed_size) f.close() diff --git a/test/tst_compression_szip.py b/test/tst_compression_szip.py index a09cffbba..e62c8b177 100644 --- a/test/tst_compression_szip.py +++ b/test/tst_compression_szip.py @@ -13,7 +13,7 @@ def write_netcdf(filename,dtype='f8'): foo = nc.createVariable('data',\ dtype,('n'),compression=None) foo_szip = nc.createVariable('data_szip',\ - dtype,('n'),compression='szip',szip_mask='ec',szip_pixels_per_block=32) + dtype,('n'),compression='szip',szip_coding='ec',szip_pixels_per_block=32) foo[:] = datarr foo_szip[:] = datarr nc.close() @@ -35,7 +35,7 @@ def runTest(self): {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(datarr,f.variables['data_szip'][:]) dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'szip': - {'mask': 'ec', 'pixels_per_block': 32}, + {'coding': 'ec', 'pixels_per_block': 32}, 'shuffle': False, 'complevel': 4, 'fletcher32': False} print(f.variables['data_szip'].filters()) #assert f.variables['data_szip'].filters() == dtest diff --git a/test/tst_compression_zstd.py b/test/tst_compression_zstd.py index cd9d270ef..2745e03ba 100644 --- a/test/tst_compression_zstd.py +++ b/test/tst_compression_zstd.py @@ -36,7 +36,7 @@ def runTest(self): size = os.stat(self.filename1).st_size assert_almost_equal(array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} + {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(size,uncompressed_size) f.close() # check compressed data. @@ -44,7 +44,7 @@ def runTest(self): size = os.stat(self.filename2).st_size assert_almost_equal(array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':False,'zstd':True,'bzip2':False,'blosc':False,'shuffle':False,'complevel':4,'fletcher32':False} + {'zlib':False,'szip':False,'zstd':True,'bzip2':False,'blosc':False,'shuffle':False,'complevel':4,'fletcher32':False} assert(size < 0.96*uncompressed_size) f.close() From 0ee4be01cf9e17765a31dc003ebd4e4ec62c1d09 Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 3 May 2022 13:56:26 -0700 Subject: [PATCH 53/92] update --- test/tst_compression_quant.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/test/tst_compression_quant.py b/test/tst_compression_quant.py index 9a44420c3..898f84f71 100644 --- a/test/tst_compression_quant.py +++ b/test/tst_compression_quant.py @@ -59,7 +59,7 @@ def runTest(self): #print('compressed lossless no shuffle = ',size) assert_almost_equal(data_array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':complevel,'fletcher32':False} + {'zlib':True,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':complevel,'fletcher32':False} assert(size < 0.95*uncompressed_size) f.close() # check compression with shuffle @@ -68,7 +68,7 @@ def runTest(self): #print('compressed lossless with shuffle ',size) assert_almost_equal(data_array,f.variables['data'][:]) assert f.variables['data'].filters() ==\ - {'zlib':True,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':complevel,'fletcher32':False} + {'zlib':True,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':True,'complevel':complevel,'fletcher32':False} assert(size < 0.85*uncompressed_size) f.close() # check lossy compression without shuffle From fbc6cc2ef12817863ae235f3593811a5261ab29e Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 3 May 2022 14:05:21 -0700 Subject: [PATCH 54/92] update --- Changelog | 8 ++++---- README.md | 2 +- test/tst_compression_szip.py | 5 +++-- 3 files changed, 8 insertions(+), 7 deletions(-) diff --git a/Changelog b/Changelog index 2128d23bd..eebb7f378 100644 --- a/Changelog +++ b/Changelog @@ -12,12 +12,12 @@ * remove all vestiges of python 2 in _netCDF4.pyx and set cython language_level directive to 3 in setup.py. * add 'compression' kwarg to createVariable to enable new compression - functionality in netcdf-c 4.9.0. 'None','zlib','zstd','bzip2' + functionality in netcdf-c 4.9.0. 'None','zlib','szip','zstd','bzip2' 'blosc_lz','blosc_lz4','blosc_lz4hc','blosc_zlib' and 'blosc_zstd' - are currently supported. 'blosc_shuffle' and - 'blosc_blocksize' kwargs also added. + are currently supported. 'blosc_shuffle', 'blosc_blocksize', + 'szip_mask' and 'szip_pixels_per_block' kwargs also added. compression='zlib' is equivalent to (the now deprecated) zlib=True. - Using new compressors requires setting HDF5_PLUGIN_PATH to point to + Using new compressors (except 'szip') requires setting HDF5_PLUGIN_PATH to point to the installation location of the netcdf-c filter plugins. * MFDataset did not aggregate 'name' variable attribute (issue #1153). * issue warning instead of raising an exception if missing_value or diff --git a/README.md b/README.md index 99171e273..dc85804ba 100644 --- a/README.md +++ b/README.md @@ -15,7 +15,7 @@ for quantization (bit-grooming and bit-rounding) functionality in netcdf-c 4.9.0 dramatically improve compression. Dataset.createVariable now accepts dimension instances (instead of just dimension names). 'compression' kwarg added to Dataset.createVariable to support new compression algorithms available in netcdf-c 4.9.0 through HDF5 filter plugsins (such -as zstd, bzip2 and blosc). Working arm64 wheels for Apple M1 Silicon now available on pypi. +as zstd, bzip2 and blosc) as well as szip. Working arm64 wheels for Apple M1 Silicon now available on pypi. 10/31/2021: Version [1.5.8](https://pypi.python.org/pypi/netCDF4/1.5.8) released. Fix Enum bug, add binary wheels for aarch64 and python 3.10. diff --git a/test/tst_compression_szip.py b/test/tst_compression_szip.py index e62c8b177..c51efb745 100644 --- a/test/tst_compression_szip.py +++ b/test/tst_compression_szip.py @@ -34,11 +34,12 @@ def runTest(self): assert f.variables['data'].filters() ==\ {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(datarr,f.variables['data_szip'][:]) - dtest = {'zlib': False, 'zstd': False, 'bzip2': False, 'blosc': False, 'szip': + dtest = {'zlib': False, 'szip': {'coding': 'ec', 'pixels_per_block': 32}, + 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': False, 'complevel': 4, 'fletcher32': False} print(f.variables['data_szip'].filters()) - #assert f.variables['data_szip'].filters() == dtest + assert f.variables['data_szip'].filters() == dtest f.close() if __name__ == '__main__': From 8d0f0f6b5a9913ffbf522875d7a33385dafee8d7 Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 3 May 2022 14:12:05 -0700 Subject: [PATCH 55/92] try to debug inq_var_szip error --- src/netCDF4/_netCDF4.pyx | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 938514c15..682073b5b 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -4460,8 +4460,8 @@ return dictionary containing HDF5 filter parameters.""" IF HAS_SZIP_SUPPORT: ierr = nc_inq_var_szip(self._grpid, self._varid, &iszip_coding,\ &iszip_pixels_per_block) - if ierr != 0: iszip=0 - #_ensure_nc_success(ierr) + #if ierr != 0: iszip=0 + _ensure_nc_success(ierr) if ideflate: filtdict['zlib']=True filtdict['complevel']=icomplevel From a9e5ced5ae98a56b0e482e748cecc1ccfb8e2412 Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 3 May 2022 14:19:30 -0700 Subject: [PATCH 56/92] update --- src/netCDF4/_netCDF4.pyx | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 682073b5b..cb80b72b4 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -4460,7 +4460,10 @@ return dictionary containing HDF5 filter parameters.""" IF HAS_SZIP_SUPPORT: ierr = nc_inq_var_szip(self._grpid, self._varid, &iszip_coding,\ &iszip_pixels_per_block) - #if ierr != 0: iszip=0 + if ierr != 0: + iszip=0 + else: + iszip=1 _ensure_nc_success(ierr) if ideflate: filtdict['zlib']=True From 7a48b10dd42069dca2d4f8b5d64de14d1cd61960 Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 3 May 2022 14:38:08 -0700 Subject: [PATCH 57/92] update --- src/netCDF4/_netCDF4.pyx | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index cb80b72b4..1a22f8fab 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -4479,7 +4479,9 @@ return dictionary containing HDF5 filter parameters.""" filtdict['blosc']={'compressor':_blosc_dict_inv[blosc_compressor],'shuffle':iblosc_shuffle,'blocksize':iblosc_blocksize} filtdict['complevel']=iblosc_complevel if iszip: - filtdict['szip']={'coding':_szip_dict_inv[iszip_coding],'pixels_per_block':iszip_pixels_per_block} + print(iszip_coding, iszip_pixels_per_block) + szip_coding = iszip_coding + filtdict['szip']={'coding':_szip_dict_inv[szip_coding],'pixels_per_block':iszip_pixels_per_block} if ishuffle: filtdict['shuffle']=True if ifletcher32: From 2e1a2a27baf1e07a9a347fa3d9e32d304cb0281d Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 3 May 2022 15:33:54 -0700 Subject: [PATCH 58/92] update --- src/netCDF4/_netCDF4.pyx | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 1a22f8fab..188d56956 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -3994,7 +3994,11 @@ behavior is similar to Fortran or Matlab, but different than numpy. _ensure_nc_success(ierr) if szip: IF HAS_SZIP_SUPPORT: - iszip_coding = _szip_dict[szip_coding] + try: + iszip_coding = _szip_dict[szip_coding] + except KeyError: + msg="unknown szip coding ('ec' or 'nn' supported)" + raise ValueError(msg) iszip_pixels_per_block = szip_pixels_per_block ierr = nc_def_var_szip(self._grpid, self._varid, iszip_coding, iszip_pixels_per_block) if ierr != NC_NOERR: @@ -4463,8 +4467,11 @@ return dictionary containing HDF5 filter parameters.""" if ierr != 0: iszip=0 else: - iszip=1 - _ensure_nc_success(ierr) + if iszip_coding: + iszip=1 + else: + iszip=0 + #_ensure_nc_success(ierr) if ideflate: filtdict['zlib']=True filtdict['complevel']=icomplevel @@ -4479,7 +4486,6 @@ return dictionary containing HDF5 filter parameters.""" filtdict['blosc']={'compressor':_blosc_dict_inv[blosc_compressor],'shuffle':iblosc_shuffle,'blocksize':iblosc_blocksize} filtdict['complevel']=iblosc_complevel if iszip: - print(iszip_coding, iszip_pixels_per_block) szip_coding = iszip_coding filtdict['szip']={'coding':_szip_dict_inv[szip_coding],'pixels_per_block':iszip_pixels_per_block} if ishuffle: From c1b100c6e4d5170ef0f3949d409bac20909bbeaf Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 3 May 2022 16:22:08 -0700 Subject: [PATCH 59/92] update --- test/tst_compression_szip.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/test/tst_compression_szip.py b/test/tst_compression_szip.py index c51efb745..0f69c779a 100644 --- a/test/tst_compression_szip.py +++ b/test/tst_compression_szip.py @@ -39,7 +39,7 @@ def runTest(self): 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': False, 'complevel': 4, 'fletcher32': False} print(f.variables['data_szip'].filters()) - assert f.variables['data_szip'].filters() == dtest + #assert f.variables['data_szip'].filters() == dtest f.close() if __name__ == '__main__': From 39743f6817a9ec4d2a3bba61ecd42ee41714a237 Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 3 May 2022 22:21:19 -0700 Subject: [PATCH 60/92] update --- test/tst_compression_szip.py | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/test/tst_compression_szip.py b/test/tst_compression_szip.py index 0f69c779a..874ed3821 100644 --- a/test/tst_compression_szip.py +++ b/test/tst_compression_szip.py @@ -34,12 +34,8 @@ def runTest(self): assert f.variables['data'].filters() ==\ {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(datarr,f.variables['data_szip'][:]) - dtest = {'zlib': False, 'szip': - {'coding': 'ec', 'pixels_per_block': 32}, - 'zstd': False, 'bzip2': False, 'blosc': False, - 'shuffle': False, 'complevel': 4, 'fletcher32': False} - print(f.variables['data_szip'].filters()) - #assert f.variables['data_szip'].filters() == dtest + dtest = {'zlib': False, 'szip': {'coding': 'ec', 'pixels_per_block': 32}, 'zstd': False, 'bzip2': False, 'blosc': False, 'shuffle': False, 'complevel': 0, 'fletcher32': False} + assert f.variables['data_szip'].filters() == dtest f.close() if __name__ == '__main__': From ed4644cd1794bddd8f2bfc8eb5bf284f3ea9658c Mon Sep 17 00:00:00 2001 From: jswhit Date: Wed, 4 May 2022 07:29:13 -0700 Subject: [PATCH 61/92] update docs --- README.md | 4 +- docs/index.html | 480 +++++++++++++++++++++------------------ src/netCDF4/_netCDF4.pyx | 21 +- 3 files changed, 284 insertions(+), 221 deletions(-) diff --git a/README.md b/README.md index dc85804ba..7615d10d1 100644 --- a/README.md +++ b/README.md @@ -13,9 +13,9 @@ For details on the latest updates, see the [Changelog](https://github.com/Unidat ??/??/2022: Version [1.6.0](https://pypi.python.org/pypi/netCDF4/1.6.0) released. Support for quantization (bit-grooming and bit-rounding) functionality in netcdf-c 4.9.0 which can dramatically improve compression. Dataset.createVariable now accepts dimension instances (instead -of just dimension names). 'compression' kwarg added to Dataset.createVariable to support +of just dimension names). 'compression' kwarg added to Dataset.createVariable to support szip as well as new compression algorithms available in netcdf-c 4.9.0 through HDF5 filter plugsins (such -as zstd, bzip2 and blosc) as well as szip. Working arm64 wheels for Apple M1 Silicon now available on pypi. +as zstd, bzip2 and blosc). Working arm64 wheels for Apple M1 Silicon now available on pypi. 10/31/2021: Version [1.5.8](https://pypi.python.org/pypi/netCDF4/1.5.8) released. Fix Enum bug, add binary wheels for aarch64 and python 3.10. diff --git a/docs/index.html b/docs/index.html index cf86fbfa2..99c42d903 100644 --- a/docs/index.html +++ b/docs/index.html @@ -3,24 +3,24 @@ - + netCDF4 API documentation - - - - - - -

    @@ -576,7 +577,7 @@

    Creating/Opening/Closing a netCDF

    Here's an example:

    -
    >>> from netCDF4 import Dataset
    +
    >>> from netCDF4 import Dataset
     >>> rootgrp = Dataset("test.nc", "w", format="NETCDF4")
     >>> print(rootgrp.data_model)
     NETCDF4
    @@ -605,7 +606,7 @@ 

    Groups in a netCDF file

    NETCDF4 formatted files support Groups, if you try to create a Group in a netCDF 3 file you will get an error message.

    -
    >>> rootgrp = Dataset("test.nc", "a")
    +
    >>> rootgrp = Dataset("test.nc", "a")
     >>> fcstgrp = rootgrp.createGroup("forecasts")
     >>> analgrp = rootgrp.createGroup("analyses")
     >>> print(rootgrp.groups)
    @@ -629,7 +630,7 @@ 

    Groups in a netCDF file

    that group. To simplify the creation of nested groups, you can use a unix-like path as an argument to Dataset.createGroup.

    -
    >>> fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
    +
    >>> fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
     >>> fcstgrp2 = rootgrp.createGroup("/forecasts/model2")
     
    @@ -643,7 +644,7 @@

    Groups in a netCDF file

    to walk the directory tree. Note that printing the Dataset or Group object yields summary information about it's contents.

    -
    >>> def walktree(top):
    +
    >>> def walktree(top):
     ...     yield top.groups.values()
     ...     for value in top.groups.values():
     ...         yield from walktree(value)
    @@ -693,7 +694,7 @@ 

    Dimensions in a netCDF file

    dimension is a new netCDF 4 feature, in netCDF 3 files there may be only one, and it must be the first (leftmost) dimension of the variable.

    -
    >>> level = rootgrp.createDimension("level", None)
    +
    >>> level = rootgrp.createDimension("level", None)
     >>> time = rootgrp.createDimension("time", None)
     >>> lat = rootgrp.createDimension("lat", 73)
     >>> lon = rootgrp.createDimension("lon", 144)
    @@ -701,7 +702,7 @@ 

    Dimensions in a netCDF file

    All of the Dimension instances are stored in a python dictionary.

    -
    >>> print(rootgrp.dimensions)
    +
    >>> print(rootgrp.dimensions)
     {'level': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0, 'time': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0, 'lat': <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73, 'lon': <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144}
     
    @@ -710,7 +711,7 @@

    Dimensions in a netCDF file

    Dimension.isunlimited method of a Dimension instance be used to determine if the dimensions is unlimited, or appendable.

    -
    >>> print(len(lon))
    +
    >>> print(len(lon))
     144
     >>> print(lon.isunlimited())
     False
    @@ -722,7 +723,7 @@ 

    Dimensions in a netCDF file

    provides useful summary info, including the name and length of the dimension, and whether it is unlimited.

    -
    >>> for dimobj in rootgrp.dimensions.values():
    +
    >>> for dimobj in rootgrp.dimensions.values():
     ...     print(dimobj)
     <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0
     <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
    @@ -767,7 +768,7 @@ 

    Variables in a netCDF file

    method returns an instance of the Variable class whose methods can be used later to access and set variable data and attributes.

    -
    >>> times = rootgrp.createVariable("time","f8",("time",))
    +
    >>> times = rootgrp.createVariable("time","f8",("time",))
     >>> levels = rootgrp.createVariable("level","i4",("level",))
     >>> latitudes = rootgrp.createVariable("lat","f4",("lat",))
     >>> longitudes = rootgrp.createVariable("lon","f4",("lon",))
    @@ -779,7 +780,7 @@ 

    Variables in a netCDF file

    To get summary info on a Variable instance in an interactive session, just print it.

    -
    >>> print(temp)
    +
    >>> print(temp)
     <class 'netCDF4._netCDF4.Variable'>
     float32 temp(time, level, lat, lon)
         units: K
    @@ -790,7 +791,7 @@ 

    Variables in a netCDF file

    You can use a path to create a Variable inside a hierarchy of groups.

    -
    >>> ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))
    +
    >>> ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))
     

    If the intermediate groups do not yet exist, they will be created.

    @@ -798,7 +799,7 @@

    Variables in a netCDF file

    You can also query a Dataset or Group instance directly to obtain Group or Variable instances using paths.

    -
    >>> print(rootgrp["/forecasts/model1"])  # a Group instance
    +
    >>> print(rootgrp["/forecasts/model1"])  # a Group instance
     <class 'netCDF4._netCDF4.Group'>
     group /forecasts/model1:
         dimensions(sizes): 
    @@ -816,7 +817,7 @@ 

    Variables in a netCDF file

    All of the variables in the Dataset or Group are stored in a Python dictionary, in the same way as the dimensions:

    -
    >>> print(rootgrp.variables)
    +
    >>> print(rootgrp.variables)
     {'time': <class 'netCDF4._netCDF4.Variable'>
     float64 time(time)
     unlimited dimensions: time
    @@ -859,7 +860,7 @@ 

    Attributes in a netCDF file

    variables. Attributes can be strings, numbers or sequences. Returning to our example,

    -
    >>> import time
    +
    >>> import time
     >>> rootgrp.description = "bogus example script"
     >>> rootgrp.history = "Created " + time.ctime(time.time())
     >>> rootgrp.source = "netCDF4 python module tutorial"
    @@ -877,7 +878,7 @@ 

    Attributes in a netCDF file

    built-in dir Python function will return a bunch of private methods and attributes that cannot (or should not) be modified by the user.

    -
    >>> for name in rootgrp.ncattrs():
    +
    >>> for name in rootgrp.ncattrs():
     ...     print("Global attr {} = {}".format(name, getattr(rootgrp, name)))
     Global attr description = bogus example script
     Global attr history = Created Mon Jul  8 14:19:41 2019
    @@ -888,7 +889,7 @@ 

    Attributes in a netCDF file

    instance provides all the netCDF attribute name/value pairs in a python dictionary:

    -
    >>> print(rootgrp.__dict__)
    +
    >>> print(rootgrp.__dict__)
     {'description': 'bogus example script', 'history': 'Created Mon Jul  8 14:19:41 2019', 'source': 'netCDF4 python module tutorial'}
     
    @@ -901,7 +902,7 @@

    Writing data

    Now that you have a netCDF Variable instance, how do you put data into it? You can just treat it like an array and assign data to a slice.

    -
    >>> import numpy as np
    +
    >>> import numpy as np
     >>> lats =  np.arange(-90,91,2.5)
     >>> lons =  np.arange(-180,180,2.5)
     >>> latitudes[:] = lats
    @@ -921,7 +922,7 @@ 

    Writing data objects with unlimited dimensions will grow along those dimensions if you assign data outside the currently defined range of indices.

    -
    >>> # append along two unlimited dimensions by assigning to slice.
    +
    >>> # append along two unlimited dimensions by assigning to slice.
     >>> nlats = len(rootgrp.dimensions["lat"])
     >>> nlons = len(rootgrp.dimensions["lon"])
     >>> print("temp shape before adding data = {}".format(temp.shape))
    @@ -941,7 +942,7 @@ 

    Writing data along the level dimension of the variable temp, even though no data has yet been assigned to levels.

    -
    >>> # now, assign data to levels dimension variable.
    +
    >>> # now, assign data to levels dimension variable.
     >>> levels[:] =  [1000.,850.,700.,500.,300.,250.,200.,150.,100.,50.]
     
    @@ -954,7 +955,7 @@

    Writing data allowed, and these indices work independently along each dimension (similar to the way vector subscripts work in fortran). This means that

    -
    >>> temp[0, 0, [0,1,2,3], [0,1,2,3]].shape
    +
    >>> temp[0, 0, [0,1,2,3], [0,1,2,3]].shape
     (4, 4)
     
    @@ -972,14 +973,14 @@

    Writing data

    For example,

    -
    >>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]
    +
    >>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]
     

    will extract time indices 0,2 and 4, pressure levels 850, 500 and 200 hPa, all Northern Hemisphere latitudes and Eastern Hemisphere longitudes, resulting in a numpy array of shape (3, 3, 36, 71).

    -
    >>> print("shape of fancy temp slice = {}".format(tempdat.shape))
    +
    >>> print("shape of fancy temp slice = {}".format(tempdat.shape))
     shape of fancy temp slice = (3, 3, 36, 71)
     
    @@ -1012,7 +1013,7 @@

    Dealing with time coordinates

    provided by cftime to do just that. Here's an example of how they can be used:

    -
    >>> # fill in times.
    +
    >>> # fill in times.
     >>> from datetime import datetime, timedelta
     >>> from cftime import num2date, date2num
     >>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
    @@ -1052,7 +1053,7 @@ 

    Reading data from a multi NETCDF4_CLASSIC format (NETCDF4 formatted multi-file datasets are not supported).

    -
    >>> for nf in range(10):
    +
    >>> for nf in range(10):
     ...     with Dataset("mftest%s.nc" % nf, "w", format="NETCDF4_CLASSIC") as f:
     ...         _ = f.createDimension("x",None)
     ...         x = f.createVariable("x","i",("x",))
    @@ -1061,7 +1062,7 @@ 

    Reading data from a multi

    Now read all the files back in at once with MFDataset

    -
    >>> from netCDF4 import MFDataset
    +
    >>> from netCDF4 import MFDataset
     >>> f = MFDataset("mftest*nc")
     >>> print(f.variables["x"][:])
     [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
    @@ -1123,22 +1124,22 @@ 

    Efficient compression of netC

    In our example, try replacing the line

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
     

    with

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib')
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib')
     

    and then

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',least_significant_digit=3)
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',least_significant_digit=3)
     

    or with netcdf-c >= 4.9.0

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',significant_digits=4)
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',significant_digits=4)
     

    and see how much smaller the resulting files are.

    @@ -1159,7 +1160,7 @@

    Beyond ho Since there is no native complex data type in netcdf, compound types are handy for storing numpy complex arrays. Here's an example:

    -
    >>> f = Dataset("complex.nc","w")
    +
    >>> f = Dataset("complex.nc","w")
     >>> size = 3 # length of 1-d complex array
     >>> # create sample complex data.
     >>> datac = np.exp(1j*(1.+np.linspace(0, np.pi, size)))
    @@ -1195,7 +1196,7 @@ 

    Beyond ho in a Python dictionary, just like variables and dimensions. As always, printing objects gives useful summary information in an interactive session:

    -
    >>> print(f)
    +
    >>> print(f)
     <class 'netCDF4._netCDF4.Dataset'>
     root group (NETCDF4 data model, file format HDF5):
         dimensions(sizes): x_dim(3)
    @@ -1220,7 +1221,7 @@ 

    Variable-length (vlen) data types

    data type, use the Dataset.createVLType method method of a Dataset or Group instance.

    -
    >>> f = Dataset("tst_vlen.nc","w")
    +
    >>> f = Dataset("tst_vlen.nc","w")
     >>> vlen_t = f.createVLType(np.int32, "phony_vlen")
     
    @@ -1230,7 +1231,7 @@

    Variable-length (vlen) data types

    but compound data types cannot. A new variable can then be created using this datatype.

    -
    >>> x = f.createDimension("x",3)
    +
    >>> x = f.createDimension("x",3)
     >>> y = f.createDimension("y",4)
     >>> vlvar = f.createVariable("phony_vlen_var", vlen_t, ("y","x"))
     
    @@ -1243,7 +1244,7 @@

    Variable-length (vlen) data types

    In this case, they contain 1-D numpy int32 arrays of random length between 1 and 10.

    -
    >>> import random
    +
    >>> import random
     >>> random.seed(54321)
     >>> data = np.empty(len(y)*len(x),object)
     >>> for n in range(len(y)*len(x)):
    @@ -1283,7 +1284,7 @@ 

    Variable-length (vlen) data types

    with fixed length greater than 1) when calling the Dataset.createVariable method.

    -
    >>> z = f.createDimension("z",10)
    +
    >>> z = f.createDimension("z",10)
     >>> strvar = f.createVariable("strvar", str, "z")
     
    @@ -1291,7 +1292,7 @@

    Variable-length (vlen) data types

    random lengths between 2 and 12 characters, and the data in the object array is assigned to the vlen string variable.

    -
    >>> chars = "1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    +
    >>> chars = "1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
     >>> data = np.empty(10,"O")
     >>> for n in range(10):
     ...     stringlen = random.randint(2,12)
    @@ -1330,7 +1331,7 @@ 

    Enum data type

    values and their names are used to define an Enum data type using Dataset.createEnumType.

    -
    >>> nc = Dataset('clouds.nc','w')
    +
    >>> nc = Dataset('clouds.nc','w')
     >>> # python dict with allowed values and their names.
     >>> enum_dict = {'Altocumulus': 7, 'Missing': 255,
     ... 'Stratus': 2, 'Clear': 0,
    @@ -1348,7 +1349,7 @@ 

    Enum data type

    is made to write an integer value not associated with one of the specified names.

    -
    >>> time = nc.createDimension('time',None)
    +
    >>> time = nc.createDimension('time',None)
     >>> # create a 1d variable of type 'cloud_type'.
     >>> # The fill_value is set to the 'Missing' named value.
     >>> cloud_var = nc.createVariable('primary_cloud',cloud_type,'time',
    @@ -1385,7 +1386,7 @@ 

    Parallel IO

    available. To use parallel IO, your program must be running in an MPI environment using mpi4py.

    -
    >>> from mpi4py import MPI
    +
    >>> from mpi4py import MPI
     >>> import numpy as np
     >>> from netCDF4 import Dataset
     >>> rank = MPI.COMM_WORLD.rank  # The process ID (integer 0-3 for 4-process run)
    @@ -1397,7 +1398,7 @@ 

    Parallel IO

    when a new dataset is created or an existing dataset is opened, use the parallel keyword to enable parallel access.

    -
    >>> nc = Dataset('parallel_test.nc','w',parallel=True)
    +
    >>> nc = Dataset('parallel_test.nc','w',parallel=True)
     

    The optional comm keyword may be used to specify a particular @@ -1405,7 +1406,7 @@

    Parallel IO

    can now write to the file indepedently. In this example the process rank is written to a different variable index on each task

    -
    >>> d = nc.createDimension('dim',4)
    +
    >>> d = nc.createDimension('dim',4)
     >>> v = nc.createVariable('var', np.int64, 'dim')
     >>> v[rank] = rank
     >>> nc.close()
    @@ -1472,7 +1473,7 @@ 

    Dealing with strings

    stringtochar is used to convert the numpy string array to an array of characters with one more dimension. For example,

    -
    >>> from netCDF4 import stringtochar
    +
    >>> from netCDF4 import stringtochar
     >>> nc = Dataset('stringtest.nc','w',format='NETCDF4_CLASSIC')
     >>> _ = nc.createDimension('nchars',3)
     >>> _ = nc.createDimension('nstrings',None)
    @@ -1505,7 +1506,7 @@ 

    Dealing with strings

    character array dtype under the hood when creating the netcdf compound type. Here's an example:

    -
    >>> nc = Dataset('compoundstring_example.nc','w')
    +
    >>> nc = Dataset('compoundstring_example.nc','w')
     >>> dtype = np.dtype([('observation', 'f4'),
     ...                      ('station_name','S10')])
     >>> station_data_t = nc.createCompoundType(dtype,'station_data')
    @@ -1550,7 +1551,7 @@ 

    In-memory (diskless) Datasets

    object representing the Dataset. Below are examples illustrating both approaches.

    -
    >>> # create a diskless (in-memory) Dataset,
    +
    >>> # create a diskless (in-memory) Dataset,
     >>> # and persist the file to disk when it is closed.
     >>> nc = Dataset('diskless_example.nc','w',diskless=True,persist=True)
     >>> d = nc.createDimension('x',None)
    @@ -1612,7 +1613,7 @@ 

    In-memory (diskless) Datasets

    the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

    -

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    +

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    copyright: 2008 by Jeffrey Whitaker.

    @@ -1625,7 +1626,7 @@

    In-memory (diskless) Datasets

    View Source -
    # init for netCDF4. package
    +            
    # init for netCDF4. package
     # Docstring comes from extension module _netCDF4.
     from ._netCDF4 import *
     # Need explicit imports for names beginning with underscores
    @@ -1636,7 +1637,7 @@ 

    In-memory (diskless) Datasets

    __has_nc_create_mem__, __has_cdf5_format__, __has_parallel4_support__, __has_pnetcdf_support__, __has_quantization_support__, __has_zstandard_support__, - __has_bzip2_support__, __has_blosc_support__) + __has_bzip2_support__, __has_blosc_support__, __has_szip_support__) __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache']
    @@ -1653,7 +1654,7 @@

    In-memory (diskless) Datasets

    Dataset:
    - +

    A netCDF Dataset is a collection of dimensions, groups, variables and attributes. Together they describe the meaning of data and relations among data fields stored in a netCDF file. See Dataset.__init__ for more @@ -1731,7 +1732,7 @@

    In-memory (diskless) Datasets

    Dataset()
    - +

    __init__(self, filename, mode="r", clobber=True, diskless=False, persist=False, keepweakref=False, memory=None, encoding=None, parallel=False, comm=None, info=None, format='NETCDF4')

    @@ -1744,7 +1745,7 @@

    In-memory (diskless) Datasets

    mode: access mode. r means read-only; no data can be modified. w means write; a new file is created, an existing file with -the same name is deleted. 'x' means write, but fail if an existing +the same name is deleted. x means write, but fail if an existing file with the same name already exists. a and r+ mean append; an existing file is opened for reading and writing, if file does not exist already, one is created. @@ -1759,7 +1760,7 @@

    In-memory (diskless) Datasets

    clobber: if True (default), opening a file with mode='w' will clobber an existing file with the same name. if False, an exception will be raised if a file with the same name already exists. -mode='x' is identical to mode='w' with clobber=False.

    +mode=x is identical to mode=w with clobber=False.

    format: underlying file format (one of 'NETCDF4', 'NETCDF4_CLASSIC', 'NETCDF3_CLASSIC', 'NETCDF3_64BIT_OFFSET' or @@ -1802,14 +1803,14 @@

    In-memory (diskless) Datasets

    rendered unusable when the parent Dataset instance is garbage collected.

    memory: if not None, create or open an in-memory Dataset. -If mode = 'r', the memory kwarg must contain a memory buffer object +If mode = r, the memory kwarg must contain a memory buffer object (an object that supports the python buffer interface). The Dataset will then be created with contents taken from this block of memory. -If mode = 'w', the memory kwarg should contain the anticipated size +If mode = w, the memory kwarg should contain the anticipated size of the Dataset in bytes (used only for NETCDF3 files). A memory buffer containing a copy of the Dataset is returned by the -Dataset.close method. Requires netcdf-c version 4.4.1 for mode='r, -netcdf-c 4.6.2 for mode='w'. To persist the file to disk, the raw +Dataset.close method. Requires netcdf-c version 4.4.1 for mode=r +netcdf-c 4.6.2 for mode=w. To persist the file to disk, the raw bytes from the returned buffer can be written into a binary file. The Dataset can also be re-opened using this memory buffer.

    @@ -1837,7 +1838,7 @@

    In-memory (diskless) Datasets

    filepath(unknown):
    - +

    filepath(self,encoding=None)

    Get the file system path (or the opendap URL) which was used to @@ -1856,7 +1857,7 @@

    In-memory (diskless) Datasets

    close(unknown):
    - +

    close(self)

    Close the Dataset.

    @@ -1872,7 +1873,7 @@

    In-memory (diskless) Datasets

    isopen(unknown):
    - +

    isopen(self)

    Is the Dataset open or closed?

    @@ -1888,7 +1889,7 @@

    In-memory (diskless) Datasets

    sync(unknown):
    - +

    sync(self)

    Writes all buffered data in the Dataset to the disk file.

    @@ -1904,7 +1905,7 @@

    In-memory (diskless) Datasets

    set_fill_on(unknown):
    - +

    set_fill_on(self)

    Sets the fill mode for a Dataset open for writing to on.

    @@ -1928,7 +1929,7 @@

    In-memory (diskless) Datasets

    set_fill_off(unknown):
    - +

    set_fill_off(self)

    Sets the fill mode for a Dataset open for writing to off.

    @@ -1948,7 +1949,7 @@

    In-memory (diskless) Datasets

    createDimension(unknown):
    - +

    createDimension(self, dimname, size=None)

    Creates a new dimension with the given dimname and size.

    @@ -1972,7 +1973,7 @@

    In-memory (diskless) Datasets

    renameDimension(unknown):
    - +

    renameDimension(self, oldname, newname)

    rename a Dimension named oldname to newname.

    @@ -1988,7 +1989,7 @@

    In-memory (diskless) Datasets

    createCompoundType(unknown):
    - +

    createCompoundType(self, datatype, datatype_name)

    Creates a new compound data type named datatype_name from the numpy @@ -2013,7 +2014,7 @@

    In-memory (diskless) Datasets

    createVLType(unknown):
    - +

    createVLType(self, datatype, datatype_name)

    Creates a new VLEN data type named datatype_name from a numpy @@ -2033,7 +2034,7 @@

    In-memory (diskless) Datasets

    createEnumType(unknown):
    - +

    createEnumType(self, datatype, datatype_name, enum_dict)

    Creates a new Enum data type named datatype_name from a numpy @@ -2054,9 +2055,10 @@

    In-memory (diskless) Datasets

    createVariable(unknown):
    - +

    createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, +szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, blosc_blocksize=0, endian='native', least_significant_digit=None, significant_digits=None, quantize_mode='BitGroom', fill_value=None, chunk_cache=None)

    @@ -2092,10 +2094,10 @@

    In-memory (diskless) Datasets

    If the optional keyword argument compression is set, the data will be compressed in the netCDF file using the specified compression algorithm. -Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, +Currently zlib,szip,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except -zlib use the HDF5 plugin architecture, which requires that the +zlib and szip use the HDF5 plugin architecture, which requires that the environment variable HDF5_PLUGIN_PATH be set to the location of the plugins built by netcdf-c (unless the plugins are installed in the default location /usr/local/hdf5/lib).

    @@ -2109,7 +2111,7 @@

    In-memory (diskless) Datasets

    A value of zero disables compression.

    If the optional keyword shuffle is True, the HDF5 shuffle filter -will be applied before compressing the data (default True). This +will be applied before compressing the data with zlib (default True). This significantly improves compression. Default is True. Ignored if zlib=False.

    @@ -2119,6 +2121,11 @@

    In-memory (diskless) Datasets

    is the tunable blosc blocksize in bytes (Default 0 means the blocksize is chosen internally).

    +

    The optional kwargs szip_coding and szip_pixels_per_block are ignored +unless the szip compressor is used. szip_coding can be ec (entropy coding) +or nn (nearest neighbor coding). Default is nn. szip_pixels_per_block +can be 4, 8, 16 or 32 (default 8).

    +

    If the optional keyword fletcher32 is True, the Fletcher32 HDF5 checksum algorithm is activated to detect errors. Default False.

    @@ -2151,7 +2158,7 @@

    In-memory (diskless) Datasets

    The optional keyword fill_value can be used to override the default netCDF _FillValue (the value that the variable gets filled with before -any data is written to it, defaults given in the dict netCDF4.default_fillvals). +any data is written to it, defaults given in the dict netCDF4.default_fillvals). If fill_value is set to False, then the variable is not pre-filled.

    If the optional keyword parameters least_significant_digit or significant_digits are @@ -2219,7 +2226,7 @@

    In-memory (diskless) Datasets

    renameVariable(unknown):
    - +

    renameVariable(self, oldname, newname)

    rename a Variable named oldname to newname

    @@ -2235,7 +2242,7 @@

    In-memory (diskless) Datasets

    createGroup(unknown):
    - +

    createGroup(self, groupname)

    Creates a new Group with the given groupname.

    @@ -2261,7 +2268,7 @@

    In-memory (diskless) Datasets

    ncattrs(unknown):
    - +

    ncattrs(self)

    return netCDF global attribute names for this Dataset or Group in a list.

    @@ -2277,7 +2284,7 @@

    In-memory (diskless) Datasets

    setncattr(unknown):
    - +

    setncattr(self,name,value)

    set a netCDF dataset or group attribute using name,value pair. @@ -2295,7 +2302,7 @@

    In-memory (diskless) Datasets

    setncattr_string(unknown):
    - +

    setncattr_string(self,name,value)

    set a netCDF dataset or group string attribute using name,value pair. @@ -2313,7 +2320,7 @@

    In-memory (diskless) Datasets

    setncatts(unknown):
    - +

    setncatts(self,attdict)

    set a bunch of netCDF dataset or group attributes at once using a python dictionary. @@ -2332,7 +2339,7 @@

    In-memory (diskless) Datasets

    getncattr(unknown):
    - +

    getncattr(self,name)

    retrieve a netCDF dataset or group attribute. @@ -2353,7 +2360,7 @@

    In-memory (diskless) Datasets

    delncattr(unknown):
    - +

    delncattr(self,name,value)

    delete a netCDF dataset or group attribute. Use if you need to delete a @@ -2371,7 +2378,7 @@

    In-memory (diskless) Datasets

    renameAttribute(unknown):
    - +

    renameAttribute(self, oldname, newname)

    rename a Dataset or Group attribute named oldname to newname.

    @@ -2387,7 +2394,7 @@

    In-memory (diskless) Datasets

    renameGroup(unknown):
    - +

    renameGroup(self, oldname, newname)

    rename a Group named oldname to newname (requires netcdf >= 4.3.1).

    @@ -2403,7 +2410,7 @@

    In-memory (diskless) Datasets

    set_auto_chartostring(unknown):
    - +

    set_auto_chartostring(self, True_or_False)

    Call Variable.set_auto_chartostring for all variables contained in this Dataset or @@ -2428,7 +2435,7 @@

    In-memory (diskless) Datasets

    set_auto_maskandscale(unknown):
    - +

    set_auto_maskandscale(self, True_or_False)

    Call Variable.set_auto_maskandscale for all variables contained in this Dataset or @@ -2451,7 +2458,7 @@

    In-memory (diskless) Datasets

    set_auto_mask(unknown):
    - +

    set_auto_mask(self, True_or_False)

    Call Variable.set_auto_mask for all variables contained in this Dataset or @@ -2475,7 +2482,7 @@

    In-memory (diskless) Datasets

    set_auto_scale(unknown):
    - +

    set_auto_scale(self, True_or_False)

    Call Variable.set_auto_scale for all variables contained in this Dataset or @@ -2498,7 +2505,7 @@

    In-memory (diskless) Datasets

    set_always_mask(unknown):
    - +

    set_always_mask(self, True_or_False)

    Call Variable.set_always_mask for all variables contained in @@ -2526,7 +2533,7 @@

    In-memory (diskless) Datasets

    set_ncstring_attrs(unknown):
    - +

    set_ncstring_attrs(self, True_or_False)

    Call Variable.set_ncstring_attrs for all variables contained in @@ -2551,7 +2558,7 @@

    In-memory (diskless) Datasets

    get_variables_by_attributes(unknown):
    - +

    get_variables_by_attribute(self, **kwargs)

    Returns a list of variables that match specific conditions.

    @@ -2559,7 +2566,7 @@

    In-memory (diskless) Datasets

    Can pass in key=value parameters and variables are returned that contain all of the matches. For example,

    -
    >>> # Get variables with x-axis attribute.
    +
    >>> # Get variables with x-axis attribute.
     >>> vs = nc.get_variables_by_attributes(axis='X')
     >>> # Get variables with matching "standard_name" attribute
     >>> vs = nc.get_variables_by_attributes(standard_name='northward_sea_water_velocity')
    @@ -2570,7 +2577,7 @@ 

    In-memory (diskless) Datasets

    the attribute value. None is given as the attribute value when the attribute does not exist on the variable. For example,

    -
    >>> # Get Axis variables
    +
    >>> # Get Axis variables
     >>> vs = nc.get_variables_by_attributes(axis=lambda v: v in ['X', 'Y', 'Z', 'T'])
     >>> # Get variables that don't have an "axis" attribute
     >>> vs = nc.get_variables_by_attributes(axis=lambda v: v is None)
    @@ -2589,7 +2596,7 @@ 

    In-memory (diskless) Datasets

    fromcdl(unknown):
    - +

    fromcdl(cdlfilename, ncfilename=None, mode='a',format='NETCDF4')

    call ncgen via subprocess to create Dataset from CDL @@ -2619,7 +2626,7 @@

    In-memory (diskless) Datasets

    tocdl(unknown):
    - +

    tocdl(self, coordvars=False, data=False, outfile=None)

    call ncdump via subprocess to create CDL @@ -2638,9 +2645,10 @@

    In-memory (diskless) Datasets

    #   - name = <attribute 'name' of 'netCDF4._netCDF4.Dataset' objects> + name
    +

    string name of Group instance

    @@ -2649,109 +2657,121 @@

    In-memory (diskless) Datasets

    #   - groups = <attribute 'groups' of 'netCDF4._netCDF4.Dataset' objects> + groups
    +
    #   - dimensions = <attribute 'dimensions' of 'netCDF4._netCDF4.Dataset' objects> + dimensions
    +
    #   - variables = <attribute 'variables' of 'netCDF4._netCDF4.Dataset' objects> + variables
    +
    #   - disk_format = <attribute 'disk_format' of 'netCDF4._netCDF4.Dataset' objects> + disk_format
    +
    #   - path = <attribute 'path' of 'netCDF4._netCDF4.Dataset' objects> + path
    +
    #   - parent = <attribute 'parent' of 'netCDF4._netCDF4.Dataset' objects> + parent
    +
    #   - file_format = <attribute 'file_format' of 'netCDF4._netCDF4.Dataset' objects> + file_format
    +
    #   - data_model = <attribute 'data_model' of 'netCDF4._netCDF4.Dataset' objects> + data_model
    +
    #   - cmptypes = <attribute 'cmptypes' of 'netCDF4._netCDF4.Dataset' objects> + cmptypes
    +
    #   - vltypes = <attribute 'vltypes' of 'netCDF4._netCDF4.Dataset' objects> + vltypes
    +
    #   - enumtypes = <attribute 'enumtypes' of 'netCDF4._netCDF4.Dataset' objects> + enumtypes
    +
    #   - keepweakref = <attribute 'keepweakref' of 'netCDF4._netCDF4.Dataset' objects> + keepweakref
    +

    @@ -2764,7 +2784,7 @@

    In-memory (diskless) Datasets

    Variable:
    - +

    A netCDF Variable is used to read and write netCDF data. They are analogous to numpy array objects. See Variable.__init__ for more details.

    @@ -2848,9 +2868,10 @@

    In-memory (diskless) Datasets

    Variable()
    - +

    __init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, -complevel=4, shuffle=True, blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, +complevel=4, shuffle=True, szip_coding='nn', szip_pixels_per_block=8, +blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None,chunk_cache=None)

    @@ -2884,7 +2905,7 @@

    In-memory (diskless) Datasets

    which means the variable is a scalar (and therefore has no dimensions).

    compression: compression algorithm to use. -Currently zlib,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, +Currently zlib,szip,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except zlib use the HDF5 plugin architecture, which requires that the @@ -2898,18 +2919,26 @@

    In-memory (diskless) Datasets

    complevel: the level of compression to use (1 is the fastest, but poorest compression, 9 is the slowest but best compression). Default 4. -Ignored if compression=None. A value of 0 disables compression.

    +Ignored if compression=None or szip. A value of 0 disables compression.

    shuffle: if True, the HDF5 shuffle filter is applied -to improve compression. Default True. Ignored if compression=None.

    +to improve zlib compression. Default True. Ignored unless compression = 'zlib'.

    blosc_shuffle: shuffle filter inside blosc compressor (only relevant if compression kwarg set to one of the blosc compressors). Can be 0 (no blosc shuffle), 1 (bytewise shuffle) or 2 (bitwise -shuffle)). Default is 1.

    +shuffle)). Default is 1. Ignored if blosc compressor not used.

    blosc_blocksize: tunable blocksize in bytes for blosc -compressors. Default of 0 means blosc library chooses a blocksize.

    +compressors. Default of 0 means blosc library chooses a blocksize. +Ignored if blosc compressor not used.

    + +

    szip_coding: szip coding method. Can be ec (entropy coding) +or nn (nearest neighbor coding). Default is nn. +Ignored if szip compressor not used.

    + +

    szip_pixels_per_block: Can be 4,8,16 or 32 (Default 8). +Ignored if szip compressor not used.

    fletcher32: if True (default False), the Fletcher32 checksum algorithm is used for error detection.

    @@ -2967,7 +2996,7 @@

    In-memory (diskless) Datasets

    value that the variable gets filled with before any data is written to it) is replaced with this value. If fill_value is set to False, then the variable is not pre-filled. The default netCDF fill values can be found -in the dictionary netCDF4.default_fillvals.

    +in the dictionary netCDF4.default_fillvals.

    chunk_cache: If specified, sets the chunk cache size for this variable. Persists as long as Dataset is open. Use set_var_chunk_cache to @@ -2988,7 +3017,7 @@

    In-memory (diskless) Datasets

    group(unknown):
    - +

    group(self)

    return the group that this Variable is a member of.

    @@ -3004,7 +3033,7 @@

    In-memory (diskless) Datasets

    ncattrs(unknown):
    - +

    ncattrs(self)

    return netCDF attribute names for this Variable in a list.

    @@ -3020,7 +3049,7 @@

    In-memory (diskless) Datasets

    setncattr(unknown):
    - +

    setncattr(self,name,value)

    set a netCDF variable attribute using name,value pair. Use if you need to set a @@ -3038,7 +3067,7 @@

    In-memory (diskless) Datasets

    setncattr_string(unknown):
    - +

    setncattr_string(self,name,value)

    set a netCDF variable string attribute using name,value pair. @@ -3057,7 +3086,7 @@

    In-memory (diskless) Datasets

    setncatts(unknown):
    - +

    setncatts(self,attdict)

    set a bunch of netCDF variable attributes at once using a python dictionary. @@ -3076,7 +3105,7 @@

    In-memory (diskless) Datasets

    getncattr(unknown):
    - +

    getncattr(self,name)

    retrieve a netCDF variable attribute. Use if you need to set a @@ -3097,7 +3126,7 @@

    In-memory (diskless) Datasets

    delncattr(unknown):
    - +

    delncattr(self,name,value)

    delete a netCDF variable attribute. Use if you need to delete a @@ -3115,7 +3144,7 @@

    In-memory (diskless) Datasets

    filters(unknown):
    - +

    filters(self)

    return dictionary containing HDF5 filter parameters.

    @@ -3131,7 +3160,7 @@

    In-memory (diskless) Datasets

    quantization(unknown):
    - +

    quantization(self)

    return number of significant digits and the algorithm used in quantization. @@ -3148,7 +3177,7 @@

    In-memory (diskless) Datasets

    endian(unknown):
    - +

    endian(self)

    return endian-ness (little,big,native) of variable (as stored in HDF5 file).

    @@ -3164,7 +3193,7 @@

    In-memory (diskless) Datasets

    chunking(unknown):
    - +

    chunking(self)

    return variable chunking information. If the dataset is @@ -3183,7 +3212,7 @@

    In-memory (diskless) Datasets

    get_var_chunk_cache(unknown):
    - +

    get_var_chunk_cache(self)

    return variable chunk cache information in a tuple (size,nelems,preemption). @@ -3201,7 +3230,7 @@

    In-memory (diskless) Datasets

    set_var_chunk_cache(unknown):
    - +

    set_var_chunk_cache(self,size=None,nelems=None,preemption=None)

    change variable chunk cache settings. @@ -3219,7 +3248,7 @@

    In-memory (diskless) Datasets

    renameAttribute(unknown):
    - +

    renameAttribute(self, oldname, newname)

    rename a Variable attribute named oldname to newname.

    @@ -3235,7 +3264,7 @@

    In-memory (diskless) Datasets

    assignValue(unknown):
    - +

    assignValue(self, val)

    assign a value to a scalar variable. Provided for compatibility with @@ -3252,7 +3281,7 @@

    In-memory (diskless) Datasets

    getValue(unknown):
    - +

    getValue(self)

    get the value of a scalar variable. Provided for compatibility with @@ -3269,7 +3298,7 @@

    In-memory (diskless) Datasets

    set_auto_chartostring(unknown):
    - +

    set_auto_chartostring(self,chartostring)

    turn on or off automatic conversion of character variable data to and @@ -3300,7 +3329,7 @@

    In-memory (diskless) Datasets

    use_nc_get_vars(unknown):
    - +

    use_nc_get_vars(self,_use_get_vars)

    enable the use of netcdf library routine nc_get_vars @@ -3320,7 +3349,7 @@

    In-memory (diskless) Datasets

    set_auto_maskandscale(unknown):
    - +

    set_auto_maskandscale(self,maskandscale)

    turn on or off automatic conversion of variable data to and @@ -3384,7 +3413,7 @@

    In-memory (diskless) Datasets

    set_auto_scale(unknown):
    - +

    set_auto_scale(self,scale)

    turn on or off automatic packing/unpacking of variable @@ -3433,7 +3462,7 @@

    In-memory (diskless) Datasets

    set_auto_mask(unknown):
    - +

    set_auto_mask(self,mask)

    turn on or off automatic conversion of variable data to and @@ -3468,7 +3497,7 @@

    In-memory (diskless) Datasets

    set_always_mask(unknown):
    - +

    set_always_mask(self,always_mask)

    turn on or off conversion of data without missing values to regular @@ -3491,7 +3520,7 @@

    In-memory (diskless) Datasets

    set_ncstring_attrs(unknown):
    - +

    set_always_mask(self,ncstring_attrs)

    turn on or off creating NC_STRING string attributes.

    @@ -3513,7 +3542,7 @@

    In-memory (diskless) Datasets

    set_collective(unknown):
    - +

    set_collective(self,True_or_False)

    turn on or off collective parallel IO access. Ignored if file is not @@ -3530,7 +3559,7 @@

    In-memory (diskless) Datasets

    get_dims(unknown):
    - +

    get_dims(self)

    return a tuple of Dimension instances associated with this @@ -3542,9 +3571,10 @@

    In-memory (diskless) Datasets

    #   - name = <attribute 'name' of 'netCDF4._netCDF4.Variable' objects> + name
    +

    string name of Variable instance

    @@ -3553,9 +3583,10 @@

    In-memory (diskless) Datasets

    #   - datatype = <attribute 'datatype' of 'netCDF4._netCDF4.Variable' objects> + datatype
    +

    numpy data type (for primitive data types) or VLType/CompoundType/EnumType instance (for compound, vlen or enum data types)

    @@ -3566,9 +3597,10 @@

    In-memory (diskless) Datasets

    #   - shape = <attribute 'shape' of 'netCDF4._netCDF4.Variable' objects> + shape
    +

    find current sizes of all variable dimensions

    @@ -3577,9 +3609,10 @@

    In-memory (diskless) Datasets

    #   - size = <attribute 'size' of 'netCDF4._netCDF4.Variable' objects> + size
    +

    Return the number of stored elements.

    @@ -3588,9 +3621,10 @@

    In-memory (diskless) Datasets

    #   - dimensions = <attribute 'dimensions' of 'netCDF4._netCDF4.Variable' objects> + dimensions
    +

    get variables's dimension names

    @@ -3599,55 +3633,61 @@

    In-memory (diskless) Datasets

    #   - ndim = <attribute 'ndim' of 'netCDF4._netCDF4.Variable' objects> + ndim
    +
    #   - dtype = <attribute 'dtype' of 'netCDF4._netCDF4.Variable' objects> + dtype
    +
    #   - mask = <attribute 'mask' of 'netCDF4._netCDF4.Variable' objects> + mask
    +
    #   - scale = <attribute 'scale' of 'netCDF4._netCDF4.Variable' objects> + scale
    +
    #   - always_mask = <attribute 'always_mask' of 'netCDF4._netCDF4.Variable' objects> + always_mask
    +
    #   - chartostring = <attribute 'chartostring' of 'netCDF4._netCDF4.Variable' objects> + chartostring
    +
    @@ -3660,7 +3700,7 @@

    In-memory (diskless) Datasets

    Dimension:
    - +

    A netCDF Dimension is used to describe the coordinates of a Variable. See Dimension.__init__ for more details.

    @@ -3686,7 +3726,7 @@

    In-memory (diskless) Datasets

    Dimension()
    - +

    __init__(self, group, name, size=None)

    Dimension constructor.

    @@ -3712,7 +3752,7 @@

    In-memory (diskless) Datasets

    group(unknown):
    - +

    group(self)

    return the group that this Dimension is a member of.

    @@ -3728,7 +3768,7 @@

    In-memory (diskless) Datasets

    isunlimited(unknown):
    - +

    isunlimited(self)

    returns True if the Dimension instance is unlimited, False otherwise.

    @@ -3739,9 +3779,10 @@

    In-memory (diskless) Datasets

    #   - name = <attribute 'name' of 'netCDF4._netCDF4.Dimension' objects> + name
    +

    string name of Dimension instance

    @@ -3750,9 +3791,10 @@

    In-memory (diskless) Datasets

    #   - size = <attribute 'size' of 'netCDF4._netCDF4.Dimension' objects> + size
    +

    current size of Dimension (calls len on Dimension instance)

    @@ -3768,7 +3810,7 @@

    In-memory (diskless) Datasets

    Group(netCDF4.Dataset):
    - +

    Groups define a hierarchical namespace within a netCDF file. They are analogous to directories in a unix filesystem. Each Group behaves like a Dataset within a Dataset, and can contain it's own variables, @@ -3792,7 +3834,7 @@

    In-memory (diskless) Datasets

    Group()
    - +

    __init__(self, parent, name) Group constructor.

    @@ -3816,7 +3858,7 @@

    In-memory (diskless) Datasets

    close(unknown):
    - +

    close(self)

    overrides Dataset close method which does not apply to Group @@ -3886,7 +3928,7 @@

    Inherited Members
    MFDataset(netCDF4.Dataset):
    - +

    Class for reading multi-file netCDF Datasets, making variables spanning multiple files appear as if they were in one file. Datasets must be in NETCDF4_CLASSIC, NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET @@ -3896,7 +3938,7 @@

    Inherited Members

    Example usage (See MFDataset.__init__ for more details):

    -
    >>> import numpy as np
    +
    >>> import numpy as np
     >>> # create a series of netCDF files with a variable sharing
     >>> # the same unlimited dimension.
     >>> for nf in range(10):
    @@ -3923,7 +3965,7 @@ 
    Inherited Members
    MFDataset(files, check=False, aggdim=None, exclude=[], master_file=None)
    - +

    __init__(self, files, check=False, aggdim=None, exclude=[], master_file=None)

    @@ -3968,7 +4010,7 @@
    Inherited Members
    ncattrs(self):
    - +

    ncattrs(self)

    return the netcdf attribute names from the master file.

    @@ -3984,7 +4026,7 @@
    Inherited Members
    close(self):
    - +

    close(self)

    close all the open files.

    @@ -4052,13 +4094,13 @@
    Inherited Members
    MFTime(netCDF4._netCDF4._Variable):
    - +

    Class providing an interface to a MFDataset time Variable by imposing a unique common time unit and/or calendar to all files.

    Example usage (See MFTime.__init__ for more details):

    -
    >>> import numpy as np
    +
    >>> import numpy as np
     >>> f1 = Dataset("mftest_1.nc","w", format="NETCDF4_CLASSIC")
     >>> f2 = Dataset("mftest_2.nc","w", format="NETCDF4_CLASSIC")
     >>> f1.createDimension("time",None)
    @@ -4094,7 +4136,7 @@ 
    Inherited Members
    MFTime(time, units=None, calendar=None)
    - +

    __init__(self, time, units=None, calendar=None)

    Create a time Variable with units consistent across a multifile @@ -4138,7 +4180,7 @@

    Inherited Members
    CompoundType:
    - +

    A CompoundType instance is used to describe a compound data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. @@ -4157,7 +4199,7 @@

    Inherited Members
    CompoundType()
    - +

    __init__(group, datatype, datatype_name)

    CompoundType constructor.

    @@ -4186,28 +4228,31 @@
    Inherited Members
    #   - dtype = <attribute 'dtype' of 'netCDF4._netCDF4.CompoundType' objects> + dtype
    +
    #   - dtype_view = <attribute 'dtype_view' of 'netCDF4._netCDF4.CompoundType' objects> + dtype_view
    +
    #   - name = <attribute 'name' of 'netCDF4._netCDF4.CompoundType' objects> + name
    +
    @@ -4220,7 +4265,7 @@
    Inherited Members
    VLType:
    - +

    A VLType instance is used to describe a variable length (VLEN) data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. See @@ -4238,7 +4283,7 @@

    Inherited Members
    VLType()
    - +

    __init__(group, datatype, datatype_name)

    VLType constructor.

    @@ -4261,19 +4306,21 @@
    Inherited Members
    #   - dtype = <attribute 'dtype' of 'netCDF4._netCDF4.VLType' objects> + dtype
    +
    #   - name = <attribute 'name' of 'netCDF4._netCDF4.VLType' objects> + name
    +
    @@ -4285,7 +4332,7 @@
    Inherited Members
    date2num(unknown):
    - +

    date2num(dates, units, calendar=None, has_year_zero=None)

    Return numeric time values given datetime objects. The units @@ -4345,7 +4392,7 @@

    Inherited Members
    num2date(unknown):
    - +

    num2date(times, units, calendar=u'standard', only_use_cftime_datetimes=True, only_use_python_datetimes=False, has_year_zero=None)

    Return datetime objects given numeric time values. The units @@ -4417,7 +4464,7 @@

    Inherited Members
    date2index(unknown):
    - +

    date2index(dates, nctime, calendar=None, select=u'exact', has_year_zero=None)

    Return indices of a netCDF time variable corresponding to the given dates.

    @@ -4471,7 +4518,7 @@
    Inherited Members
    stringtochar(unknown):
    - +

    stringtochar(a,encoding='utf-8')

    convert a string array to a character array with one extra dimension

    @@ -4498,7 +4545,7 @@
    Inherited Members
    chartostring(unknown):
    - +

    chartostring(b,encoding='utf-8')

    convert a character array to a string array with one less dimension.

    @@ -4525,7 +4572,7 @@
    Inherited Members
    stringtoarr(unknown):
    - +

    stringtoarr(a, NUMCHARS,dtype='S')

    convert a string to a character array of length NUMCHARS

    @@ -4553,7 +4600,7 @@
    Inherited Members
    getlibversion(unknown):
    - +

    getlibversion()

    returns a string describing the version of the netcdf library @@ -4571,7 +4618,7 @@

    Inherited Members
    EnumType:
    - +

    A EnumType instance is used to describe an Enum data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. See @@ -4589,7 +4636,7 @@

    Inherited Members
    EnumType()
    - +

    __init__(group, datatype, datatype_name, enum_dict)

    EnumType constructor.

    @@ -4615,28 +4662,31 @@
    Inherited Members
    #   - dtype = <attribute 'dtype' of 'netCDF4._netCDF4.EnumType' objects> + dtype
    +
    #   - name = <attribute 'name' of 'netCDF4._netCDF4.EnumType' objects> + name
    +
    #   - enum_dict = <attribute 'enum_dict' of 'netCDF4._netCDF4.EnumType' objects> + enum_dict
    +
    @@ -4648,7 +4698,7 @@
    Inherited Members
    get_chunk_cache(unknown):
    - +

    get_chunk_cache()

    return current netCDF chunk cache information in a tuple (size,nelems,preemption). @@ -4666,7 +4716,7 @@

    Inherited Members
    set_chunk_cache(unknown):
    - +

    set_chunk_cache(self,size=None,nelems=None,preemption=None)

    change netCDF4 chunk cache settings. diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 188d56956..c835943dc 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -2691,7 +2691,7 @@ is an empty tuple, which means the variable is a scalar. If the optional keyword argument `compression` is set, the data will be compressed in the netCDF file using the specified compression algorithm. -Currently `zlib`,`szip`, `zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, +Currently `zlib`,`szip`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, `blosc_zlib` and `blosc_zstd` are supported. Default is `None` (no compression). All of the compressors except `zlib` and `szip` use the HDF5 plugin architecture, which requires that the @@ -2718,6 +2718,11 @@ unless the blosc compressor is used. `blosc_shuffle` can be 0 (no shuffle), is the tunable blosc blocksize in bytes (Default 0 means the blocksize is chosen internally). +The optional kwargs `szip_coding` and `szip_pixels_per_block` are ignored +unless the szip compressor is used. `szip_coding` can be `ec` (entropy coding) +or `nn` (nearest neighbor coding). Default is `nn`. `szip_pixels_per_block` +can be 4, 8, 16 or 32 (default 8). + If the optional keyword `fletcher32` is `True`, the Fletcher32 HDF5 checksum algorithm is activated to detect errors. Default `False`. @@ -3713,7 +3718,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. which means the variable is a scalar (and therefore has no dimensions). **`compression`**: compression algorithm to use. - Currently `zlib`,`szip`, `zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, + Currently `zlib`,`szip`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, `blosc_zlib` and `blosc_zstd` are supported. Default is `None` (no compression). All of the compressors except `zlib` use the HDF5 plugin architecture, which requires that the @@ -3727,7 +3732,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. **`complevel`**: the level of compression to use (1 is the fastest, but poorest compression, 9 is the slowest but best compression). Default 4. - Ignored if `compression=None`. A value of 0 disables compression. + Ignored if `compression=None` or `szip`. A value of 0 disables compression. **`shuffle`**: if `True`, the HDF5 shuffle filter is applied to improve zlib compression. Default `True`. Ignored unless `compression = 'zlib'`. @@ -3735,10 +3740,18 @@ behavior is similar to Fortran or Matlab, but different than numpy. **`blosc_shuffle`**: shuffle filter inside blosc compressor (only relevant if compression kwarg set to one of the blosc compressors). Can be 0 (no blosc shuffle), 1 (bytewise shuffle) or 2 (bitwise - shuffle)). Default is 1. + shuffle)). Default is 1. Ignored if blosc compressor not used. **`blosc_blocksize`**: tunable blocksize in bytes for blosc compressors. Default of 0 means blosc library chooses a blocksize. + Ignored if blosc compressor not used. + + **`szip_coding`**: szip coding method. Can be `ec` (entropy coding) + or `nn` (nearest neighbor coding). Default is `nn`. + Ignored if szip compressor not used. + + **`szip_pixels_per_block`**: Can be 4,8,16 or 32 (Default 8). + Ignored if szip compressor not used. **`fletcher32`**: if `True` (default `False`), the Fletcher32 checksum algorithm is used for error detection. From 32434d3a60fde4be20f95608f7c8cafa9abaedb1 Mon Sep 17 00:00:00 2001 From: jswhit Date: Mon, 9 May 2022 12:50:03 -0600 Subject: [PATCH 62/92] update docs --- docs/index.html | 41 +++++++++++++++++++++++++--------------- src/netCDF4/_netCDF4.pyx | 36 ++++++++++++++++++++++------------- 2 files changed, 49 insertions(+), 28 deletions(-) diff --git a/docs/index.html b/docs/index.html index 99c42d903..79a1a735c 100644 --- a/docs/index.html +++ b/docs/index.html @@ -488,6 +488,11 @@

    Quick Install

    • the easiest way to get going is to install via pip install netCDF4. (or if you use the conda package manager conda install -c conda-forge netCDF4).
    • +
    • installing binary wheels with pip will not get you the optional compression filters (which are enabled +via external plugins). Starting with version 4.9.0, The plugins are available +via the netcdf-c library install, and are installed +in /usr/local/hdf5/lib/plugin by default. The environment variable HDF5_PLUGIN_PATH should be set +to point to the location of the plugin install directory.

    Developer Install

    @@ -1077,17 +1082,19 @@

    Reading data from a multi

    Efficient compression of netCDF variables

    -

    Data stored in netCDF 4 Variable objects can be compressed and -decompressed on the fly. The parameters for the compression are -determined by the compression, complevel and shuffle keyword arguments -to the Dataset.createVariable method. To turn on -compression, set compression=zlib. The complevel keyword regulates the -speed and efficiency of the compression (1 being fastest, but lowest +

    Data stored in netCDF Variable objects can be compressed and +decompressed on the fly. The compression algorithm used is determined +by the compression keyword argument to the Dataset.createVariable method. +zlib compression is always available, szip is available if the linked HDF5 +library supports it, and zstd, bzip2, blosc_lz,blosc_lz4,blosc_lz4hc, +blosc_zlib and blosc_zstd are available via optional external plugins. +The complevel keyword regulates the +speed and efficiency of the compression for zlib, bzip and zstd (1 being fastest, but lowest compression ratio, 9 being slowest but best compression ratio). The default value of complevel is 4. Setting shuffle=False will turn off the HDF5 shuffle filter, which de-interlaces a block of data before -compression by reordering the bytes. The shuffle filter can -significantly improve compression ratios, and is on by default. Setting +zlib compression by reordering the bytes. The shuffle filter can +significantly improve compression ratios, and is on by default if compression=zlib. Setting fletcher32 keyword argument to Dataset.createVariable to True (it's False by default) enables the Fletcher32 checksum algorithm for error detection. @@ -1097,7 +1104,14 @@

    Efficient compression of netC Dataset.createVariable. These keyword arguments only are relevant for NETCDF4 and NETCDF4_CLASSIC files (where the underlying file format is HDF5) and are silently ignored if the file -format is NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET or NETCDF3_64BIT_DATA.

    +format is NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET or NETCDF3_64BIT_DATA. +If netcdf-c compression filter plugins are installed, and the +HDF5_PLUGIN_PATH environment variable is set to point to where the plugins +are installed, then zstd, bzip2, and the blosc family of compressors +can be used.
    +If the HDF5 library is built with szip support, compression=szip can also +be used (in conjunction with the szip_coding and szip_pixels_per_block keyword +arguments).

    If your data only has a certain number of digits of precision (say for example, it is temperature data that was measured with a precision of @@ -1613,7 +1627,7 @@

    In-memory (diskless) Datasets

    the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

    -

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    +

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    copyright: 2008 by Jeffrey Whitaker.

    @@ -2098,9 +2112,9 @@

    In-memory (diskless) Datasets

    blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except zlib and szip use the HDF5 plugin architecture, which requires that the -environment variable HDF5_PLUGIN_PATH be set to the location of the +environment variable HDF5_PLUGIN_PATH be set to the location of the external plugins built by netcdf-c (unless the plugins are installed in the -default location /usr/local/hdf5/lib).

    +default location /usr/local/hdf5/lib/plugin).

    If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is @@ -2153,9 +2167,6 @@

    In-memory (diskless) Datasets

    opposite format as the one used to create the file, there may be some performance advantage to be gained by setting the endian-ness.

    -

    The compression, zlib, complevel, shuffle, fletcher32, contiguous, chunksizes and endian -keywords are silently ignored for netCDF 3 files that do not use HDF5.

    -

    The optional keyword fill_value can be used to override the default netCDF _FillValue (the value that the variable gets filled with before any data is written to it, defaults given in the dict netCDF4.default_fillvals). diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index c835943dc..d69298668 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -27,6 +27,11 @@ types) are not supported. - the easiest way to get going is to install via `pip install netCDF4`. (or if you use the [conda](http://conda.io) package manager `conda install -c conda-forge netCDF4`). + - installing binary wheels with pip will not get you the optional compression filters (which are enabled + via external plugins). Starting with version 4.9.0, The plugins are available + via the netcdf-c library install, and are installed + in `/usr/local/hdf5/lib/plugin` by default. The environment variable `HDF5_PLUGIN_PATH` should be set + to point to the location of the plugin install directory. ## Developer Install @@ -641,17 +646,19 @@ datasets. ## Efficient compression of netCDF variables -Data stored in netCDF 4 `Variable` objects can be compressed and -decompressed on the fly. The parameters for the compression are -determined by the `compression`, `complevel` and `shuffle` keyword arguments -to the `Dataset.createVariable` method. To turn on -compression, set compression=`zlib`. The `complevel` keyword regulates the -speed and efficiency of the compression (1 being fastest, but lowest +Data stored in netCDF `Variable` objects can be compressed and +decompressed on the fly. The compression algorithm used is determined +by the `compression` keyword argument to the `Dataset.createVariable` method. +`zlib` compression is always available, `szip` is available if the linked HDF5 +library supports it, and `zstd`, `bzip2`, `blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, +`blosc_zlib` and `blosc_zstd` are available via optional external plugins. +The `complevel` keyword regulates the +speed and efficiency of the compression for `zlib`, `bzip` and `zstd` (1 being fastest, but lowest compression ratio, 9 being slowest but best compression ratio). The default value of `complevel` is 4. Setting `shuffle=False` will turn off the HDF5 shuffle filter, which de-interlaces a block of data before -compression by reordering the bytes. The shuffle filter can -significantly improve compression ratios, and is on by default. Setting +`zlib` compression by reordering the bytes. The shuffle filter can +significantly improve compression ratios, and is on by default if `compression=zlib`. Setting `fletcher32` keyword argument to `Dataset.createVariable` to `True` (it's `False` by default) enables the Fletcher32 checksum algorithm for error detection. @@ -662,6 +669,12 @@ and `endian` keyword arguments to are relevant for `NETCDF4` and `NETCDF4_CLASSIC` files (where the underlying file format is HDF5) and are silently ignored if the file format is `NETCDF3_CLASSIC`, `NETCDF3_64BIT_OFFSET` or `NETCDF3_64BIT_DATA`. +If netcdf-c compression filter plugins are installed, and the +`HDF5_PLUGIN_PATH` environment variable is set to point to where the plugins +are installed, then `zstd`, `bzip2`, and the `blosc` family of compressors +can be used. If the HDF5 library is built with szip support, compression=`szip` can also +be used (in conjunction with the `szip_coding` and `szip_pixels_per_block` keyword +arguments). If your data only has a certain number of digits of precision (say for example, it is temperature data that was measured with a precision of @@ -2695,9 +2708,9 @@ Currently `zlib`,`szip`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, `blosc_zlib` and `blosc_zstd` are supported. Default is `None` (no compression). All of the compressors except `zlib` and `szip` use the HDF5 plugin architecture, which requires that the -environment variable `HDF5_PLUGIN_PATH` be set to the location of the +environment variable `HDF5_PLUGIN_PATH` be set to the location of the external plugins built by netcdf-c (unless the plugins are installed in the -default location `/usr/local/hdf5/lib`). +default location `/usr/local/hdf5/lib/plugin`). If the optional keyword `zlib` is `True`, the data will be compressed in the netCDF file using zlib compression (default `False`). The use of this option is @@ -2750,9 +2763,6 @@ but if the data is always going to be read on a computer with the opposite format as the one used to create the file, there may be some performance advantage to be gained by setting the endian-ness. -The `compression, zlib, complevel, shuffle, fletcher32, contiguous, chunksizes` and `endian` -keywords are silently ignored for netCDF 3 files that do not use HDF5. - The optional keyword `fill_value` can be used to override the default netCDF `_FillValue` (the value that the variable gets filled with before any data is written to it, defaults given in the dict `netCDF4.default_fillvals`). From a9731f0d4d5c82f0f6f56a789a689db96603f13c Mon Sep 17 00:00:00 2001 From: jswhit Date: Wed, 11 May 2022 08:09:25 -0600 Subject: [PATCH 63/92] install plugin .so files inside package --- .github/workflows/build_master.yml | 8 +++----- setup.py | 12 +++++++++++- src/netCDF4/__init__.py | 4 ++++ 3 files changed, 18 insertions(+), 6 deletions(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 35e6d43fb..10e5fa508 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -31,12 +31,9 @@ jobs: export LDFLAGS="-L${NETCDF_DIR}/lib" export LIBS="-lhdf5_mpich_hl -lhdf5_mpich -lm -lz" autoreconf -i - ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 + ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 make -j 2 make install - pwd - mkdir ${NETCDF_DIR}/hdf5_plugins - /bin/mv -f plugins ${NETCDF_DIR} popd # - name: The job has failed @@ -53,11 +50,12 @@ jobs: - name: Install netcdf4-python run: | export PATH=${NETCDF_DIR}/bin:${PATH} + export NETCDF_PLUGIN_DIR=${NETCDF_DIR}/plugins/.libs python setup.py install - name: Test run: | export PATH=${NETCDF_DIR}/bin:${PATH} - export HDF5_PLUGIN_PATH=${NETCDF_DIR}/plugins/.libs + #export HDF5_PLUGIN_PATH=${NETCDF_DIR}/plugins/.libs python checkversion.py # serial cd test diff --git a/setup.py b/setup.py index 9cf48528e..cd7d39715 100644 --- a/setup.py +++ b/setup.py @@ -1,4 +1,4 @@ -import os, sys, subprocess +import os, sys, subprocess, glob import os.path as osp import configparser from setuptools import setup, Extension @@ -682,6 +682,15 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): else: ext_modules = None +# if NETCDF_PLUGIN_DIR set, install netcdf-c plugin shared objects in package +# (should point to location of .so files built by netcdf-c) +if os.environ.get("NETCDF_PLUGIN_DIR"): + plugin_dir = os.environ.get("NETCDF_PLUGIN_DIR") + plugins = glob.glob(os.path.join(plugin_dir, "*.so")) + data_files = plugins +else: + data_files = None + setup(name="netCDF4", cmdclass=cmdclass, version=extract_version(netcdf4_src_pyx), @@ -707,5 +716,6 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): "Operating System :: OS Independent"], packages=['netCDF4'], package_dir={'':'src'}, + data_files=[('netCDF4',data_files)], ext_modules=ext_modules, **setuptools_extra_kwargs) diff --git a/src/netCDF4/__init__.py b/src/netCDF4/__init__.py index f518b128e..f53195897 100644 --- a/src/netCDF4/__init__.py +++ b/src/netCDF4/__init__.py @@ -10,5 +10,9 @@ __has_parallel4_support__, __has_pnetcdf_support__, __has_quantization_support__, __has_zstandard_support__, __has_bzip2_support__, __has_blosc_support__, __has_szip_support__) +import os __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] +# if HDF5_PLUGIN_PATH not set, point to plugins directory inside package +if 'HDF5_PLUGIN_PATH' not in os.environ: + os.environ['HDF5_PLUGIN_PATH']=__path__[0] From d496f040ffc72fe76adcef96352f946b207af53a Mon Sep 17 00:00:00 2001 From: jswhit Date: Wed, 11 May 2022 08:19:32 -0600 Subject: [PATCH 64/92] update docs --- docs/index.html | 25 +++++++------------------ src/netCDF4/_netCDF4.pyx | 20 +++----------------- 2 files changed, 10 insertions(+), 35 deletions(-) diff --git a/docs/index.html b/docs/index.html index 79a1a735c..0164a5609 100644 --- a/docs/index.html +++ b/docs/index.html @@ -488,11 +488,6 @@

    Quick Install

    • the easiest way to get going is to install via pip install netCDF4. (or if you use the conda package manager conda install -c conda-forge netCDF4).
    • -
    • installing binary wheels with pip will not get you the optional compression filters (which are enabled -via external plugins). Starting with version 4.9.0, The plugins are available -via the netcdf-c library install, and are installed -in /usr/local/hdf5/lib/plugin by default. The environment variable HDF5_PLUGIN_PATH should be set -to point to the location of the plugin install directory.

    Developer Install

    @@ -1105,10 +1100,6 @@

    Efficient compression of netC are relevant for NETCDF4 and NETCDF4_CLASSIC files (where the underlying file format is HDF5) and are silently ignored if the file format is NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET or NETCDF3_64BIT_DATA. -If netcdf-c compression filter plugins are installed, and the -HDF5_PLUGIN_PATH environment variable is set to point to where the plugins -are installed, then zstd, bzip2, and the blosc family of compressors -can be used.
    If the HDF5 library is built with szip support, compression=szip can also be used (in conjunction with the szip_coding and szip_pixels_per_block keyword arguments).

    @@ -1627,7 +1618,7 @@

    In-memory (diskless) Datasets

    the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

    -

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    +

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    copyright: 2008 by Jeffrey Whitaker.

    @@ -1652,8 +1643,12 @@

    In-memory (diskless) Datasets

    __has_parallel4_support__, __has_pnetcdf_support__, __has_quantization_support__, __has_zstandard_support__, __has_bzip2_support__, __has_blosc_support__, __has_szip_support__) +import os __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] +# if HDF5_PLUGIN_PATH not set, point to plugins directory inside package +if 'HDF5_PLUGIN_PATH' not in os.environ: + os.environ['HDF5_PLUGIN_PATH']=__path__[0]
    @@ -2111,10 +2106,7 @@

    In-memory (diskless) Datasets

    Currently zlib,szip,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except -zlib and szip use the HDF5 plugin architecture, which requires that the -environment variable HDF5_PLUGIN_PATH be set to the location of the external -plugins built by netcdf-c (unless the plugins are installed in the -default location /usr/local/hdf5/lib/plugin).

    +zlib and szip use the HDF5 plugin architecture.

    If the optional keyword zlib is True, the data will be compressed in the netCDF file using zlib compression (default False). The use of this option is @@ -2919,10 +2911,7 @@

    In-memory (diskless) Datasets

    Currently zlib,szip,zstd,bzip2,blosc_lz,blosc_lz4,blosc_lz4hc, blosc_zlib and blosc_zstd are supported. Default is None (no compression). All of the compressors except -zlib use the HDF5 plugin architecture, which requires that the -environment variable HDF5_PLUGIN_PATH be set to the location of the -plugins built by netcdf-c (unless the plugins are installed in the -default location /usr/local/hdf5/lib).

    +zlib and szip use the HDF5 plugin architecture.

    zlib: if True, data assigned to the Variable instance is compressed on disk. Default False. Deprecated - use diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index d69298668..278f21224 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -27,11 +27,6 @@ types) are not supported. - the easiest way to get going is to install via `pip install netCDF4`. (or if you use the [conda](http://conda.io) package manager `conda install -c conda-forge netCDF4`). - - installing binary wheels with pip will not get you the optional compression filters (which are enabled - via external plugins). Starting with version 4.9.0, The plugins are available - via the netcdf-c library install, and are installed - in `/usr/local/hdf5/lib/plugin` by default. The environment variable `HDF5_PLUGIN_PATH` should be set - to point to the location of the plugin install directory. ## Developer Install @@ -669,10 +664,7 @@ and `endian` keyword arguments to are relevant for `NETCDF4` and `NETCDF4_CLASSIC` files (where the underlying file format is HDF5) and are silently ignored if the file format is `NETCDF3_CLASSIC`, `NETCDF3_64BIT_OFFSET` or `NETCDF3_64BIT_DATA`. -If netcdf-c compression filter plugins are installed, and the -`HDF5_PLUGIN_PATH` environment variable is set to point to where the plugins -are installed, then `zstd`, `bzip2`, and the `blosc` family of compressors -can be used. If the HDF5 library is built with szip support, compression=`szip` can also +If the HDF5 library is built with szip support, compression=`szip` can also be used (in conjunction with the `szip_coding` and `szip_pixels_per_block` keyword arguments). @@ -2707,10 +2699,7 @@ compressed in the netCDF file using the specified compression algorithm. Currently `zlib`,`szip`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, `blosc_zlib` and `blosc_zstd` are supported. Default is `None` (no compression). All of the compressors except -`zlib` and `szip` use the HDF5 plugin architecture, which requires that the -environment variable `HDF5_PLUGIN_PATH` be set to the location of the external -plugins built by netcdf-c (unless the plugins are installed in the -default location `/usr/local/hdf5/lib/plugin`). +`zlib` and `szip` use the HDF5 plugin architecture. If the optional keyword `zlib` is `True`, the data will be compressed in the netCDF file using zlib compression (default `False`). The use of this option is @@ -3731,10 +3720,7 @@ behavior is similar to Fortran or Matlab, but different than numpy. Currently `zlib`,`szip`,`zstd`,`bzip2`,`blosc_lz`,`blosc_lz4`,`blosc_lz4hc`, `blosc_zlib` and `blosc_zstd` are supported. Default is `None` (no compression). All of the compressors except - `zlib` use the HDF5 plugin architecture, which requires that the - environment variable `HDF5_PLUGIN_PATH` be set to the location of the - plugins built by netcdf-c (unless the plugins are installed in the - default location `/usr/local/hdf5/lib`). + `zlib` and `szip` use the HDF5 plugin architecture. **`zlib`**: if `True`, data assigned to the `Variable` instance is compressed on disk. Default `False`. Deprecated - use From 7fc0be8cbfbed62f151faf5c22b3c9509d9d99cf Mon Sep 17 00:00:00 2001 From: jswhit Date: Wed, 11 May 2022 08:29:59 -0600 Subject: [PATCH 65/92] fix error in data_files when NETCDF_PLUGIN_DIR --- setup.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.py b/setup.py index cd7d39715..beeb0950f 100644 --- a/setup.py +++ b/setup.py @@ -689,7 +689,7 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): plugins = glob.glob(os.path.join(plugin_dir, "*.so")) data_files = plugins else: - data_files = None + data_files = [] setup(name="netCDF4", cmdclass=cmdclass, From 788b0652744f1f59773370892b9ade91d50ca719 Mon Sep 17 00:00:00 2001 From: jswhit Date: Wed, 11 May 2022 09:48:38 -0600 Subject: [PATCH 66/92] print whether plugins installed --- setup.py | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/setup.py b/setup.py index beeb0950f..81ab37576 100644 --- a/setup.py +++ b/setup.py @@ -687,8 +687,16 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): if os.environ.get("NETCDF_PLUGIN_DIR"): plugin_dir = os.environ.get("NETCDF_PLUGIN_DIR") plugins = glob.glob(os.path.join(plugin_dir, "*.so")) - data_files = plugins + if not plugins: + sys.stdout.write('no .so files in NETCDF_PLUGIN_DIR, no plugin shared objects installed\n') + data_files = [] + else: + data_files = plugins + sys.stdout.write('installing plugin shared objects from %s ...\n' % plugin_dir) + sofiles = [os.path.basename(sofilepath) for sofilepath in data_files] + sys.stdout.write(repr(sofiles)+'\n') else: + sys.stdout.write('NETCDF_PLUGIN_DIR not set, no plugin shared objects installed\n') data_files = [] setup(name="netCDF4", From ad94b2dbbf234282fd2fabdf7dc534ab5291ef04 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sat, 14 May 2022 11:07:25 -0600 Subject: [PATCH 67/92] update --- test/tst_compression_blosc.py | 37 +++++++++++++++++++---------------- 1 file changed, 20 insertions(+), 17 deletions(-) diff --git a/test/tst_compression_blosc.py b/test/tst_compression_blosc.py index 66ca0e9e4..891f53b90 100644 --- a/test/tst_compression_blosc.py +++ b/test/tst_compression_blosc.py @@ -4,24 +4,27 @@ import os, tempfile, unittest ndim = 100000 +iblosc_shuffle=2 +iblosc_blocksize=800000 +iblosc_complevel=4 filename = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name datarr = uniform(size=(ndim,)) -def write_netcdf(filename,dtype='f8',complevel=6): +def write_netcdf(filename,dtype='f8',blosc_shuffle=1,blosc_blocksize=500000,complevel=6): nc = Dataset(filename,'w') nc.createDimension('n', ndim) foo = nc.createVariable('data',\ dtype,('n'),compression=None) foo_lz = nc.createVariable('data_lz',\ - dtype,('n'),compression='blosc_lz',blosc_shuffle=2,complevel=complevel) + dtype,('n'),compression='blosc_lz',blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize,complevel=complevel) foo_lz4 = nc.createVariable('data_lz4',\ - dtype,('n'),compression='blosc_lz4',blosc_shuffle=2,complevel=complevel) + dtype,('n'),compression='blosc_lz4',blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize,complevel=complevel) foo_lz4hc = nc.createVariable('data_lz4hc',\ - dtype,('n'),compression='blosc_lz4hc',blosc_shuffle=2,complevel=complevel) + dtype,('n'),compression='blosc_lz4hc',blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize,complevel=complevel) foo_zlib = nc.createVariable('data_zlib',\ - dtype,('n'),compression='blosc_zlib',blosc_shuffle=2,complevel=complevel) + dtype,('n'),compression='blosc_zlib',blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize,complevel=complevel) foo_zstd = nc.createVariable('data_zstd',\ - dtype,('n'),compression='blosc_zstd',blosc_shuffle=2,complevel=complevel) + dtype,('n'),compression='blosc_zstd',blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize,complevel=complevel) foo_lz[:] = datarr foo_lz4[:] = datarr foo_lz4hc[:] = datarr @@ -33,7 +36,7 @@ class CompressionTestCase(unittest.TestCase): def setUp(self): self.filename = filename - write_netcdf(self.filename,complevel=4) # with compression + write_netcdf(self.filename,complevel=iblosc_complevel,blosc_shuffle=iblosc_shuffle,blosc_blocksize=iblosc_blocksize) # with compression def tearDown(self): # Remove the temporary files @@ -46,28 +49,28 @@ def runTest(self): {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(datarr,f.variables['data_lz'][:]) dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': - {'compressor': 'blosc_lz', 'shuffle': 2, 'blocksize': 800000}, - 'shuffle': False, 'complevel': 4, 'fletcher32': False} + {'compressor': 'blosc_lz', 'shuffle': iblosc_shuffle, 'blocksize': iblosc_blocksize}, + 'shuffle': False, 'complevel': iblosc_complevel, 'fletcher32': False} assert f.variables['data_lz'].filters() == dtest assert_almost_equal(datarr,f.variables['data_lz4'][:]) dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': - {'compressor': 'blosc_lz4', 'shuffle': 2, 'blocksize': 800000}, - 'shuffle': False, 'complevel': 4, 'fletcher32': False} + {'compressor': 'blosc_lz4', 'shuffle': iblosc_shuffle, 'blocksize': iblosc_blocksize}, + 'shuffle': False, 'complevel': iblosc_complevel, 'fletcher32': False} assert f.variables['data_lz4'].filters() == dtest assert_almost_equal(datarr,f.variables['data_lz4hc'][:]) dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': - {'compressor': 'blosc_lz4hc', 'shuffle': 2, 'blocksize': 800000}, - 'shuffle': False, 'complevel': 4, 'fletcher32': False} + {'compressor': 'blosc_lz4hc', 'shuffle': iblosc_shuffle, 'blocksize': iblosc_blocksize}, + 'shuffle': False, 'complevel': iblosc_complevel, 'fletcher32': False} assert f.variables['data_lz4hc'].filters() == dtest assert_almost_equal(datarr,f.variables['data_zlib'][:]) dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': - {'compressor': 'blosc_zlib', 'shuffle': 2, 'blocksize': 800000}, - 'shuffle': False, 'complevel': 4, 'fletcher32': False} + {'compressor': 'blosc_zlib', 'shuffle': iblosc_shuffle, 'blocksize': iblosc_blocksize}, + 'shuffle': False, 'complevel': iblosc_complevel, 'fletcher32': False} assert f.variables['data_zlib'].filters() == dtest assert_almost_equal(datarr,f.variables['data_zstd'][:]) dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': - {'compressor': 'blosc_zstd', 'shuffle': 2, 'blocksize': 800000}, - 'shuffle': False, 'complevel': 4, 'fletcher32': False} + {'compressor': 'blosc_zstd', 'shuffle': iblosc_shuffle, 'blocksize': iblosc_blocksize}, + 'shuffle': False, 'complevel': iblosc_complevel, 'fletcher32': False} assert f.variables['data_zstd'].filters() == dtest f.close() From fa81251f03dc0643ad21059b95d4c025159eea16 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sat, 14 May 2022 11:30:15 -0600 Subject: [PATCH 68/92] remove blosc_blocksize kwarg, netcdf-c ignores it and uses var chunksize --- docs/index.html | 447 +++++++++++++++------------------- src/netCDF4/_netCDF4.pyx | 24 +- test/tst_compression_blosc.py | 25 +- 3 files changed, 224 insertions(+), 272 deletions(-) diff --git a/docs/index.html b/docs/index.html index 0164a5609..c923bcdb4 100644 --- a/docs/index.html +++ b/docs/index.html @@ -3,24 +3,24 @@ - + netCDF4 API documentation + - - - - - - - -

    @@ -577,7 +576,7 @@

    Creating/Opening/Closing a netCDF

    Here's an example:

    -
    >>> from netCDF4 import Dataset
    +
    >>> from netCDF4 import Dataset
     >>> rootgrp = Dataset("test.nc", "w", format="NETCDF4")
     >>> print(rootgrp.data_model)
     NETCDF4
    @@ -606,7 +605,7 @@ 

    Groups in a netCDF file

    NETCDF4 formatted files support Groups, if you try to create a Group in a netCDF 3 file you will get an error message.

    -
    >>> rootgrp = Dataset("test.nc", "a")
    +
    >>> rootgrp = Dataset("test.nc", "a")
     >>> fcstgrp = rootgrp.createGroup("forecasts")
     >>> analgrp = rootgrp.createGroup("analyses")
     >>> print(rootgrp.groups)
    @@ -630,7 +629,7 @@ 

    Groups in a netCDF file

    that group. To simplify the creation of nested groups, you can use a unix-like path as an argument to Dataset.createGroup.

    -
    >>> fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
    +
    >>> fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
     >>> fcstgrp2 = rootgrp.createGroup("/forecasts/model2")
     
    @@ -644,7 +643,7 @@

    Groups in a netCDF file

    to walk the directory tree. Note that printing the Dataset or Group object yields summary information about it's contents.

    -
    >>> def walktree(top):
    +
    >>> def walktree(top):
     ...     yield top.groups.values()
     ...     for value in top.groups.values():
     ...         yield from walktree(value)
    @@ -694,7 +693,7 @@ 

    Dimensions in a netCDF file

    dimension is a new netCDF 4 feature, in netCDF 3 files there may be only one, and it must be the first (leftmost) dimension of the variable.

    -
    >>> level = rootgrp.createDimension("level", None)
    +
    >>> level = rootgrp.createDimension("level", None)
     >>> time = rootgrp.createDimension("time", None)
     >>> lat = rootgrp.createDimension("lat", 73)
     >>> lon = rootgrp.createDimension("lon", 144)
    @@ -702,7 +701,7 @@ 

    Dimensions in a netCDF file

    All of the Dimension instances are stored in a python dictionary.

    -
    >>> print(rootgrp.dimensions)
    +
    >>> print(rootgrp.dimensions)
     {'level': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0, 'time': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0, 'lat': <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73, 'lon': <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144}
     
    @@ -711,7 +710,7 @@

    Dimensions in a netCDF file

    Dimension.isunlimited method of a Dimension instance be used to determine if the dimensions is unlimited, or appendable.

    -
    >>> print(len(lon))
    +
    >>> print(len(lon))
     144
     >>> print(lon.isunlimited())
     False
    @@ -723,7 +722,7 @@ 

    Dimensions in a netCDF file

    provides useful summary info, including the name and length of the dimension, and whether it is unlimited.

    -
    >>> for dimobj in rootgrp.dimensions.values():
    +
    >>> for dimobj in rootgrp.dimensions.values():
     ...     print(dimobj)
     <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0
     <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
    @@ -768,7 +767,7 @@ 

    Variables in a netCDF file

    method returns an instance of the Variable class whose methods can be used later to access and set variable data and attributes.

    -
    >>> times = rootgrp.createVariable("time","f8",("time",))
    +
    >>> times = rootgrp.createVariable("time","f8",("time",))
     >>> levels = rootgrp.createVariable("level","i4",("level",))
     >>> latitudes = rootgrp.createVariable("lat","f4",("lat",))
     >>> longitudes = rootgrp.createVariable("lon","f4",("lon",))
    @@ -780,7 +779,7 @@ 

    Variables in a netCDF file

    To get summary info on a Variable instance in an interactive session, just print it.

    -
    >>> print(temp)
    +
    >>> print(temp)
     <class 'netCDF4._netCDF4.Variable'>
     float32 temp(time, level, lat, lon)
         units: K
    @@ -791,7 +790,7 @@ 

    Variables in a netCDF file

    You can use a path to create a Variable inside a hierarchy of groups.

    -
    >>> ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))
    +
    >>> ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))
     

    If the intermediate groups do not yet exist, they will be created.

    @@ -799,7 +798,7 @@

    Variables in a netCDF file

    You can also query a Dataset or Group instance directly to obtain Group or Variable instances using paths.

    -
    >>> print(rootgrp["/forecasts/model1"])  # a Group instance
    +
    >>> print(rootgrp["/forecasts/model1"])  # a Group instance
     <class 'netCDF4._netCDF4.Group'>
     group /forecasts/model1:
         dimensions(sizes): 
    @@ -817,7 +816,7 @@ 

    Variables in a netCDF file

    All of the variables in the Dataset or Group are stored in a Python dictionary, in the same way as the dimensions:

    -
    >>> print(rootgrp.variables)
    +
    >>> print(rootgrp.variables)
     {'time': <class 'netCDF4._netCDF4.Variable'>
     float64 time(time)
     unlimited dimensions: time
    @@ -860,7 +859,7 @@ 

    Attributes in a netCDF file

    variables. Attributes can be strings, numbers or sequences. Returning to our example,

    -
    >>> import time
    +
    >>> import time
     >>> rootgrp.description = "bogus example script"
     >>> rootgrp.history = "Created " + time.ctime(time.time())
     >>> rootgrp.source = "netCDF4 python module tutorial"
    @@ -878,7 +877,7 @@ 

    Attributes in a netCDF file

    built-in dir Python function will return a bunch of private methods and attributes that cannot (or should not) be modified by the user.

    -
    >>> for name in rootgrp.ncattrs():
    +
    >>> for name in rootgrp.ncattrs():
     ...     print("Global attr {} = {}".format(name, getattr(rootgrp, name)))
     Global attr description = bogus example script
     Global attr history = Created Mon Jul  8 14:19:41 2019
    @@ -889,7 +888,7 @@ 

    Attributes in a netCDF file

    instance provides all the netCDF attribute name/value pairs in a python dictionary:

    -
    >>> print(rootgrp.__dict__)
    +
    >>> print(rootgrp.__dict__)
     {'description': 'bogus example script', 'history': 'Created Mon Jul  8 14:19:41 2019', 'source': 'netCDF4 python module tutorial'}
     
    @@ -902,7 +901,7 @@

    Writing data

    Now that you have a netCDF Variable instance, how do you put data into it? You can just treat it like an array and assign data to a slice.

    -
    >>> import numpy as np
    +
    >>> import numpy as np
     >>> lats =  np.arange(-90,91,2.5)
     >>> lons =  np.arange(-180,180,2.5)
     >>> latitudes[:] = lats
    @@ -922,7 +921,7 @@ 

    Writing data objects with unlimited dimensions will grow along those dimensions if you assign data outside the currently defined range of indices.

    -
    >>> # append along two unlimited dimensions by assigning to slice.
    +
    >>> # append along two unlimited dimensions by assigning to slice.
     >>> nlats = len(rootgrp.dimensions["lat"])
     >>> nlons = len(rootgrp.dimensions["lon"])
     >>> print("temp shape before adding data = {}".format(temp.shape))
    @@ -942,7 +941,7 @@ 

    Writing data along the level dimension of the variable temp, even though no data has yet been assigned to levels.

    -
    >>> # now, assign data to levels dimension variable.
    +
    >>> # now, assign data to levels dimension variable.
     >>> levels[:] =  [1000.,850.,700.,500.,300.,250.,200.,150.,100.,50.]
     
    @@ -955,7 +954,7 @@

    Writing data allowed, and these indices work independently along each dimension (similar to the way vector subscripts work in fortran). This means that

    -
    >>> temp[0, 0, [0,1,2,3], [0,1,2,3]].shape
    +
    >>> temp[0, 0, [0,1,2,3], [0,1,2,3]].shape
     (4, 4)
     
    @@ -973,14 +972,14 @@

    Writing data

    For example,

    -
    >>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]
    +
    >>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]
     

    will extract time indices 0,2 and 4, pressure levels 850, 500 and 200 hPa, all Northern Hemisphere latitudes and Eastern Hemisphere longitudes, resulting in a numpy array of shape (3, 3, 36, 71).

    -
    >>> print("shape of fancy temp slice = {}".format(tempdat.shape))
    +
    >>> print("shape of fancy temp slice = {}".format(tempdat.shape))
     shape of fancy temp slice = (3, 3, 36, 71)
     
    @@ -1013,7 +1012,7 @@

    Dealing with time coordinates

    provided by cftime to do just that. Here's an example of how they can be used:

    -
    >>> # fill in times.
    +
    >>> # fill in times.
     >>> from datetime import datetime, timedelta
     >>> from cftime import num2date, date2num
     >>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
    @@ -1053,7 +1052,7 @@ 

    Reading data from a multi NETCDF4_CLASSIC format (NETCDF4 formatted multi-file datasets are not supported).

    -
    >>> for nf in range(10):
    +
    >>> for nf in range(10):
     ...     with Dataset("mftest%s.nc" % nf, "w", format="NETCDF4_CLASSIC") as f:
     ...         _ = f.createDimension("x",None)
     ...         x = f.createVariable("x","i",("x",))
    @@ -1062,7 +1061,7 @@ 

    Reading data from a multi

    Now read all the files back in at once with MFDataset

    -
    >>> from netCDF4 import MFDataset
    +
    >>> from netCDF4 import MFDataset
     >>> f = MFDataset("mftest*nc")
     >>> print(f.variables["x"][:])
     [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
    @@ -1129,22 +1128,22 @@ 

    Efficient compression of netC

    In our example, try replacing the line

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
     

    with

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib')
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib')
     

    and then

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',least_significant_digit=3)
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',least_significant_digit=3)
     

    or with netcdf-c >= 4.9.0

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',significant_digits=4)
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',significant_digits=4)
     

    and see how much smaller the resulting files are.

    @@ -1165,7 +1164,7 @@

    Beyond ho Since there is no native complex data type in netcdf, compound types are handy for storing numpy complex arrays. Here's an example:

    -
    >>> f = Dataset("complex.nc","w")
    +
    >>> f = Dataset("complex.nc","w")
     >>> size = 3 # length of 1-d complex array
     >>> # create sample complex data.
     >>> datac = np.exp(1j*(1.+np.linspace(0, np.pi, size)))
    @@ -1201,7 +1200,7 @@ 

    Beyond ho in a Python dictionary, just like variables and dimensions. As always, printing objects gives useful summary information in an interactive session:

    -
    >>> print(f)
    +
    >>> print(f)
     <class 'netCDF4._netCDF4.Dataset'>
     root group (NETCDF4 data model, file format HDF5):
         dimensions(sizes): x_dim(3)
    @@ -1226,7 +1225,7 @@ 

    Variable-length (vlen) data types

    data type, use the Dataset.createVLType method method of a Dataset or Group instance.

    -
    >>> f = Dataset("tst_vlen.nc","w")
    +
    >>> f = Dataset("tst_vlen.nc","w")
     >>> vlen_t = f.createVLType(np.int32, "phony_vlen")
     
    @@ -1236,7 +1235,7 @@

    Variable-length (vlen) data types

    but compound data types cannot. A new variable can then be created using this datatype.

    -
    >>> x = f.createDimension("x",3)
    +
    >>> x = f.createDimension("x",3)
     >>> y = f.createDimension("y",4)
     >>> vlvar = f.createVariable("phony_vlen_var", vlen_t, ("y","x"))
     
    @@ -1249,7 +1248,7 @@

    Variable-length (vlen) data types

    In this case, they contain 1-D numpy int32 arrays of random length between 1 and 10.

    -
    >>> import random
    +
    >>> import random
     >>> random.seed(54321)
     >>> data = np.empty(len(y)*len(x),object)
     >>> for n in range(len(y)*len(x)):
    @@ -1289,7 +1288,7 @@ 

    Variable-length (vlen) data types

    with fixed length greater than 1) when calling the Dataset.createVariable method.

    -
    >>> z = f.createDimension("z",10)
    +
    >>> z = f.createDimension("z",10)
     >>> strvar = f.createVariable("strvar", str, "z")
     
    @@ -1297,7 +1296,7 @@

    Variable-length (vlen) data types

    random lengths between 2 and 12 characters, and the data in the object array is assigned to the vlen string variable.

    -
    >>> chars = "1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    +
    >>> chars = "1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
     >>> data = np.empty(10,"O")
     >>> for n in range(10):
     ...     stringlen = random.randint(2,12)
    @@ -1336,7 +1335,7 @@ 

    Enum data type

    values and their names are used to define an Enum data type using Dataset.createEnumType.

    -
    >>> nc = Dataset('clouds.nc','w')
    +
    >>> nc = Dataset('clouds.nc','w')
     >>> # python dict with allowed values and their names.
     >>> enum_dict = {'Altocumulus': 7, 'Missing': 255,
     ... 'Stratus': 2, 'Clear': 0,
    @@ -1354,7 +1353,7 @@ 

    Enum data type

    is made to write an integer value not associated with one of the specified names.

    -
    >>> time = nc.createDimension('time',None)
    +
    >>> time = nc.createDimension('time',None)
     >>> # create a 1d variable of type 'cloud_type'.
     >>> # The fill_value is set to the 'Missing' named value.
     >>> cloud_var = nc.createVariable('primary_cloud',cloud_type,'time',
    @@ -1391,7 +1390,7 @@ 

    Parallel IO

    available. To use parallel IO, your program must be running in an MPI environment using mpi4py.

    -
    >>> from mpi4py import MPI
    +
    >>> from mpi4py import MPI
     >>> import numpy as np
     >>> from netCDF4 import Dataset
     >>> rank = MPI.COMM_WORLD.rank  # The process ID (integer 0-3 for 4-process run)
    @@ -1403,7 +1402,7 @@ 

    Parallel IO

    when a new dataset is created or an existing dataset is opened, use the parallel keyword to enable parallel access.

    -
    >>> nc = Dataset('parallel_test.nc','w',parallel=True)
    +
    >>> nc = Dataset('parallel_test.nc','w',parallel=True)
     

    The optional comm keyword may be used to specify a particular @@ -1411,7 +1410,7 @@

    Parallel IO

    can now write to the file indepedently. In this example the process rank is written to a different variable index on each task

    -
    >>> d = nc.createDimension('dim',4)
    +
    >>> d = nc.createDimension('dim',4)
     >>> v = nc.createVariable('var', np.int64, 'dim')
     >>> v[rank] = rank
     >>> nc.close()
    @@ -1478,7 +1477,7 @@ 

    Dealing with strings

    stringtochar is used to convert the numpy string array to an array of characters with one more dimension. For example,

    -
    >>> from netCDF4 import stringtochar
    +
    >>> from netCDF4 import stringtochar
     >>> nc = Dataset('stringtest.nc','w',format='NETCDF4_CLASSIC')
     >>> _ = nc.createDimension('nchars',3)
     >>> _ = nc.createDimension('nstrings',None)
    @@ -1511,7 +1510,7 @@ 

    Dealing with strings

    character array dtype under the hood when creating the netcdf compound type. Here's an example:

    -
    >>> nc = Dataset('compoundstring_example.nc','w')
    +
    >>> nc = Dataset('compoundstring_example.nc','w')
     >>> dtype = np.dtype([('observation', 'f4'),
     ...                      ('station_name','S10')])
     >>> station_data_t = nc.createCompoundType(dtype,'station_data')
    @@ -1556,7 +1555,7 @@ 

    In-memory (diskless) Datasets

    object representing the Dataset. Below are examples illustrating both approaches.

    -
    >>> # create a diskless (in-memory) Dataset,
    +
    >>> # create a diskless (in-memory) Dataset,
     >>> # and persist the file to disk when it is closed.
     >>> nc = Dataset('diskless_example.nc','w',diskless=True,persist=True)
     >>> d = nc.createDimension('x',None)
    @@ -1618,7 +1617,7 @@ 

    In-memory (diskless) Datasets

    the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

    -

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    +

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    copyright: 2008 by Jeffrey Whitaker.

    @@ -1631,7 +1630,7 @@

    In-memory (diskless) Datasets

    View Source -
    # init for netCDF4. package
    +            
    # init for netCDF4. package
     # Docstring comes from extension module _netCDF4.
     from ._netCDF4 import *
     # Need explicit imports for names beginning with underscores
    @@ -1663,7 +1662,7 @@ 

    In-memory (diskless) Datasets

    Dataset:
    - +

    A netCDF Dataset is a collection of dimensions, groups, variables and attributes. Together they describe the meaning of data and relations among data fields stored in a netCDF file. See Dataset.__init__ for more @@ -1741,7 +1740,7 @@

    In-memory (diskless) Datasets

    Dataset()
    - +

    __init__(self, filename, mode="r", clobber=True, diskless=False, persist=False, keepweakref=False, memory=None, encoding=None, parallel=False, comm=None, info=None, format='NETCDF4')

    @@ -1847,7 +1846,7 @@

    In-memory (diskless) Datasets

    filepath(unknown):
    - +

    filepath(self,encoding=None)

    Get the file system path (or the opendap URL) which was used to @@ -1866,7 +1865,7 @@

    In-memory (diskless) Datasets

    close(unknown):
    - +

    close(self)

    Close the Dataset.

    @@ -1882,7 +1881,7 @@

    In-memory (diskless) Datasets

    isopen(unknown):
    - +

    isopen(self)

    Is the Dataset open or closed?

    @@ -1898,7 +1897,7 @@

    In-memory (diskless) Datasets

    sync(unknown):
    - +

    sync(self)

    Writes all buffered data in the Dataset to the disk file.

    @@ -1914,7 +1913,7 @@

    In-memory (diskless) Datasets

    set_fill_on(unknown):
    - +

    set_fill_on(self)

    Sets the fill mode for a Dataset open for writing to on.

    @@ -1938,7 +1937,7 @@

    In-memory (diskless) Datasets

    set_fill_off(unknown):
    - +

    set_fill_off(self)

    Sets the fill mode for a Dataset open for writing to off.

    @@ -1958,7 +1957,7 @@

    In-memory (diskless) Datasets

    createDimension(unknown):
    - +

    createDimension(self, dimname, size=None)

    Creates a new dimension with the given dimname and size.

    @@ -1982,7 +1981,7 @@

    In-memory (diskless) Datasets

    renameDimension(unknown):
    - +

    renameDimension(self, oldname, newname)

    rename a Dimension named oldname to newname.

    @@ -1998,7 +1997,7 @@

    In-memory (diskless) Datasets

    createCompoundType(unknown):
    - +

    createCompoundType(self, datatype, datatype_name)

    Creates a new compound data type named datatype_name from the numpy @@ -2023,7 +2022,7 @@

    In-memory (diskless) Datasets

    createVLType(unknown):
    - +

    createVLType(self, datatype, datatype_name)

    Creates a new VLEN data type named datatype_name from a numpy @@ -2043,7 +2042,7 @@

    In-memory (diskless) Datasets

    createEnumType(unknown):
    - +

    createEnumType(self, datatype, datatype_name, enum_dict)

    Creates a new Enum data type named datatype_name from a numpy @@ -2064,10 +2063,10 @@

    In-memory (diskless) Datasets

    createVariable(unknown):
    - +

    createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, -szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, blosc_blocksize=0, +szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, endian='native', least_significant_digit=None, significant_digits=None, quantize_mode='BitGroom', fill_value=None, chunk_cache=None)

    @@ -2121,11 +2120,9 @@

    In-memory (diskless) Datasets

    significantly improves compression. Default is True. Ignored if zlib=False.

    -

    The optional kwargs blosc_shuffle and blosc_blocksize are ignored +

    The optional kwarg blosc_shuffleis ignored unless the blosc compressor is used. blosc_shuffle can be 0 (no shuffle), -1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 1. blosc_blocksize -is the tunable blosc blocksize in bytes (Default 0 means the blocksize is -chosen internally).

    +1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 1.

    The optional kwargs szip_coding and szip_pixels_per_block are ignored unless the szip compressor is used. szip_coding can be ec (entropy coding) @@ -2161,7 +2158,7 @@

    In-memory (diskless) Datasets

    The optional keyword fill_value can be used to override the default netCDF _FillValue (the value that the variable gets filled with before -any data is written to it, defaults given in the dict netCDF4.default_fillvals). +any data is written to it, defaults given in the dict netCDF4.default_fillvals). If fill_value is set to False, then the variable is not pre-filled.

    If the optional keyword parameters least_significant_digit or significant_digits are @@ -2229,7 +2226,7 @@

    In-memory (diskless) Datasets

    renameVariable(unknown):
    - +

    renameVariable(self, oldname, newname)

    rename a Variable named oldname to newname

    @@ -2245,7 +2242,7 @@

    In-memory (diskless) Datasets

    createGroup(unknown):
    - +

    createGroup(self, groupname)

    Creates a new Group with the given groupname.

    @@ -2271,7 +2268,7 @@

    In-memory (diskless) Datasets

    ncattrs(unknown):
    - +

    ncattrs(self)

    return netCDF global attribute names for this Dataset or Group in a list.

    @@ -2287,7 +2284,7 @@

    In-memory (diskless) Datasets

    setncattr(unknown):
    - +

    setncattr(self,name,value)

    set a netCDF dataset or group attribute using name,value pair. @@ -2305,7 +2302,7 @@

    In-memory (diskless) Datasets

    setncattr_string(unknown):
    - +

    setncattr_string(self,name,value)

    set a netCDF dataset or group string attribute using name,value pair. @@ -2323,7 +2320,7 @@

    In-memory (diskless) Datasets

    setncatts(unknown):
    - +

    setncatts(self,attdict)

    set a bunch of netCDF dataset or group attributes at once using a python dictionary. @@ -2342,7 +2339,7 @@

    In-memory (diskless) Datasets

    getncattr(unknown):
    - +

    getncattr(self,name)

    retrieve a netCDF dataset or group attribute. @@ -2363,7 +2360,7 @@

    In-memory (diskless) Datasets

    delncattr(unknown):
    - +

    delncattr(self,name,value)

    delete a netCDF dataset or group attribute. Use if you need to delete a @@ -2381,7 +2378,7 @@

    In-memory (diskless) Datasets

    renameAttribute(unknown):
    - +

    renameAttribute(self, oldname, newname)

    rename a Dataset or Group attribute named oldname to newname.

    @@ -2397,7 +2394,7 @@

    In-memory (diskless) Datasets

    renameGroup(unknown):
    - +

    renameGroup(self, oldname, newname)

    rename a Group named oldname to newname (requires netcdf >= 4.3.1).

    @@ -2413,7 +2410,7 @@

    In-memory (diskless) Datasets

    set_auto_chartostring(unknown):
    - +

    set_auto_chartostring(self, True_or_False)

    Call Variable.set_auto_chartostring for all variables contained in this Dataset or @@ -2438,7 +2435,7 @@

    In-memory (diskless) Datasets

    set_auto_maskandscale(unknown):
    - +

    set_auto_maskandscale(self, True_or_False)

    Call Variable.set_auto_maskandscale for all variables contained in this Dataset or @@ -2461,7 +2458,7 @@

    In-memory (diskless) Datasets

    set_auto_mask(unknown):
    - +

    set_auto_mask(self, True_or_False)

    Call Variable.set_auto_mask for all variables contained in this Dataset or @@ -2485,7 +2482,7 @@

    In-memory (diskless) Datasets

    set_auto_scale(unknown):
    - +

    set_auto_scale(self, True_or_False)

    Call Variable.set_auto_scale for all variables contained in this Dataset or @@ -2508,7 +2505,7 @@

    In-memory (diskless) Datasets

    set_always_mask(unknown):
    - +

    set_always_mask(self, True_or_False)

    Call Variable.set_always_mask for all variables contained in @@ -2536,7 +2533,7 @@

    In-memory (diskless) Datasets

    set_ncstring_attrs(unknown):
    - +

    set_ncstring_attrs(self, True_or_False)

    Call Variable.set_ncstring_attrs for all variables contained in @@ -2561,7 +2558,7 @@

    In-memory (diskless) Datasets

    get_variables_by_attributes(unknown):
    - +

    get_variables_by_attribute(self, **kwargs)

    Returns a list of variables that match specific conditions.

    @@ -2569,7 +2566,7 @@

    In-memory (diskless) Datasets

    Can pass in key=value parameters and variables are returned that contain all of the matches. For example,

    -
    >>> # Get variables with x-axis attribute.
    +
    >>> # Get variables with x-axis attribute.
     >>> vs = nc.get_variables_by_attributes(axis='X')
     >>> # Get variables with matching "standard_name" attribute
     >>> vs = nc.get_variables_by_attributes(standard_name='northward_sea_water_velocity')
    @@ -2580,7 +2577,7 @@ 

    In-memory (diskless) Datasets

    the attribute value. None is given as the attribute value when the attribute does not exist on the variable. For example,

    -
    >>> # Get Axis variables
    +
    >>> # Get Axis variables
     >>> vs = nc.get_variables_by_attributes(axis=lambda v: v in ['X', 'Y', 'Z', 'T'])
     >>> # Get variables that don't have an "axis" attribute
     >>> vs = nc.get_variables_by_attributes(axis=lambda v: v is None)
    @@ -2599,7 +2596,7 @@ 

    In-memory (diskless) Datasets

    fromcdl(unknown):
    - +

    fromcdl(cdlfilename, ncfilename=None, mode='a',format='NETCDF4')

    call ncgen via subprocess to create Dataset from CDL @@ -2629,7 +2626,7 @@

    In-memory (diskless) Datasets

    tocdl(unknown):
    - +

    tocdl(self, coordvars=False, data=False, outfile=None)

    call ncdump via subprocess to create CDL @@ -2648,10 +2645,9 @@

    In-memory (diskless) Datasets

    #   - name + name = <attribute 'name' of 'netCDF4._netCDF4.Dataset' objects>
    -

    string name of Group instance

    @@ -2660,121 +2656,109 @@

    In-memory (diskless) Datasets

    #   - groups + groups = <attribute 'groups' of 'netCDF4._netCDF4.Dataset' objects>
    -
    #   - dimensions + dimensions = <attribute 'dimensions' of 'netCDF4._netCDF4.Dataset' objects>
    -
    #   - variables + variables = <attribute 'variables' of 'netCDF4._netCDF4.Dataset' objects>
    -
    #   - disk_format + disk_format = <attribute 'disk_format' of 'netCDF4._netCDF4.Dataset' objects>
    -
    #   - path + path = <attribute 'path' of 'netCDF4._netCDF4.Dataset' objects>
    -
    #   - parent + parent = <attribute 'parent' of 'netCDF4._netCDF4.Dataset' objects>
    -
    #   - file_format + file_format = <attribute 'file_format' of 'netCDF4._netCDF4.Dataset' objects>
    -
    #   - data_model + data_model = <attribute 'data_model' of 'netCDF4._netCDF4.Dataset' objects>
    -
    #   - cmptypes + cmptypes = <attribute 'cmptypes' of 'netCDF4._netCDF4.Dataset' objects>
    -
    #   - vltypes + vltypes = <attribute 'vltypes' of 'netCDF4._netCDF4.Dataset' objects>
    -
    #   - enumtypes + enumtypes = <attribute 'enumtypes' of 'netCDF4._netCDF4.Dataset' objects>
    -
    #   - keepweakref + keepweakref = <attribute 'keepweakref' of 'netCDF4._netCDF4.Dataset' objects>
    -

    @@ -2787,7 +2771,7 @@

    In-memory (diskless) Datasets

    Variable:
    - +

    A netCDF Variable is used to read and write netCDF data. They are analogous to numpy array objects. See Variable.__init__ for more details.

    @@ -2871,10 +2855,10 @@

    In-memory (diskless) Datasets

    Variable()
    - +

    __init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, szip_coding='nn', szip_pixels_per_block=8, -blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, +blosc_shuffle=1, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None,chunk_cache=None)

    @@ -2929,10 +2913,6 @@

    In-memory (diskless) Datasets

    Can be 0 (no blosc shuffle), 1 (bytewise shuffle) or 2 (bitwise shuffle)). Default is 1. Ignored if blosc compressor not used.

    -

    blosc_blocksize: tunable blocksize in bytes for blosc -compressors. Default of 0 means blosc library chooses a blocksize. -Ignored if blosc compressor not used.

    -

    szip_coding: szip coding method. Can be ec (entropy coding) or nn (nearest neighbor coding). Default is nn. Ignored if szip compressor not used.

    @@ -2996,7 +2976,7 @@

    In-memory (diskless) Datasets

    value that the variable gets filled with before any data is written to it) is replaced with this value. If fill_value is set to False, then the variable is not pre-filled. The default netCDF fill values can be found -in the dictionary netCDF4.default_fillvals.

    +in the dictionary netCDF4.default_fillvals.

    chunk_cache: If specified, sets the chunk cache size for this variable. Persists as long as Dataset is open. Use set_var_chunk_cache to @@ -3017,7 +2997,7 @@

    In-memory (diskless) Datasets

    group(unknown):
    - +

    group(self)

    return the group that this Variable is a member of.

    @@ -3033,7 +3013,7 @@

    In-memory (diskless) Datasets

    ncattrs(unknown):
    - +

    ncattrs(self)

    return netCDF attribute names for this Variable in a list.

    @@ -3049,7 +3029,7 @@

    In-memory (diskless) Datasets

    setncattr(unknown):
    - +

    setncattr(self,name,value)

    set a netCDF variable attribute using name,value pair. Use if you need to set a @@ -3067,7 +3047,7 @@

    In-memory (diskless) Datasets

    setncattr_string(unknown):
    - +

    setncattr_string(self,name,value)

    set a netCDF variable string attribute using name,value pair. @@ -3086,7 +3066,7 @@

    In-memory (diskless) Datasets

    setncatts(unknown):
    - +

    setncatts(self,attdict)

    set a bunch of netCDF variable attributes at once using a python dictionary. @@ -3105,7 +3085,7 @@

    In-memory (diskless) Datasets

    getncattr(unknown):
    - +

    getncattr(self,name)

    retrieve a netCDF variable attribute. Use if you need to set a @@ -3126,7 +3106,7 @@

    In-memory (diskless) Datasets

    delncattr(unknown):
    - +

    delncattr(self,name,value)

    delete a netCDF variable attribute. Use if you need to delete a @@ -3144,7 +3124,7 @@

    In-memory (diskless) Datasets

    filters(unknown):
    - +

    filters(self)

    return dictionary containing HDF5 filter parameters.

    @@ -3160,7 +3140,7 @@

    In-memory (diskless) Datasets

    quantization(unknown):
    - +

    quantization(self)

    return number of significant digits and the algorithm used in quantization. @@ -3177,7 +3157,7 @@

    In-memory (diskless) Datasets

    endian(unknown):
    - +

    endian(self)

    return endian-ness (little,big,native) of variable (as stored in HDF5 file).

    @@ -3193,7 +3173,7 @@

    In-memory (diskless) Datasets

    chunking(unknown):
    - +

    chunking(self)

    return variable chunking information. If the dataset is @@ -3212,7 +3192,7 @@

    In-memory (diskless) Datasets

    get_var_chunk_cache(unknown):
    - +

    get_var_chunk_cache(self)

    return variable chunk cache information in a tuple (size,nelems,preemption). @@ -3230,7 +3210,7 @@

    In-memory (diskless) Datasets

    set_var_chunk_cache(unknown):
    - +

    set_var_chunk_cache(self,size=None,nelems=None,preemption=None)

    change variable chunk cache settings. @@ -3248,7 +3228,7 @@

    In-memory (diskless) Datasets

    renameAttribute(unknown):
    - +

    renameAttribute(self, oldname, newname)

    rename a Variable attribute named oldname to newname.

    @@ -3264,7 +3244,7 @@

    In-memory (diskless) Datasets

    assignValue(unknown):
    - +

    assignValue(self, val)

    assign a value to a scalar variable. Provided for compatibility with @@ -3281,7 +3261,7 @@

    In-memory (diskless) Datasets

    getValue(unknown):
    - +

    getValue(self)

    get the value of a scalar variable. Provided for compatibility with @@ -3298,7 +3278,7 @@

    In-memory (diskless) Datasets

    set_auto_chartostring(unknown):
    - +

    set_auto_chartostring(self,chartostring)

    turn on or off automatic conversion of character variable data to and @@ -3329,7 +3309,7 @@

    In-memory (diskless) Datasets

    use_nc_get_vars(unknown):
    - +

    use_nc_get_vars(self,_use_get_vars)

    enable the use of netcdf library routine nc_get_vars @@ -3349,7 +3329,7 @@

    In-memory (diskless) Datasets

    set_auto_maskandscale(unknown):
    - +

    set_auto_maskandscale(self,maskandscale)

    turn on or off automatic conversion of variable data to and @@ -3413,7 +3393,7 @@

    In-memory (diskless) Datasets

    set_auto_scale(unknown):
    - +

    set_auto_scale(self,scale)

    turn on or off automatic packing/unpacking of variable @@ -3462,7 +3442,7 @@

    In-memory (diskless) Datasets

    set_auto_mask(unknown):
    - +

    set_auto_mask(self,mask)

    turn on or off automatic conversion of variable data to and @@ -3497,7 +3477,7 @@

    In-memory (diskless) Datasets

    set_always_mask(unknown):
    - +

    set_always_mask(self,always_mask)

    turn on or off conversion of data without missing values to regular @@ -3520,7 +3500,7 @@

    In-memory (diskless) Datasets

    set_ncstring_attrs(unknown):
    - +

    set_always_mask(self,ncstring_attrs)

    turn on or off creating NC_STRING string attributes.

    @@ -3542,7 +3522,7 @@

    In-memory (diskless) Datasets

    set_collective(unknown):
    - +

    set_collective(self,True_or_False)

    turn on or off collective parallel IO access. Ignored if file is not @@ -3559,7 +3539,7 @@

    In-memory (diskless) Datasets

    get_dims(unknown):
    - +

    get_dims(self)

    return a tuple of Dimension instances associated with this @@ -3571,10 +3551,9 @@

    In-memory (diskless) Datasets

    #   - name + name = <attribute 'name' of 'netCDF4._netCDF4.Variable' objects>
    -

    string name of Variable instance

    @@ -3583,10 +3562,9 @@

    In-memory (diskless) Datasets

    #   - datatype + datatype = <attribute 'datatype' of 'netCDF4._netCDF4.Variable' objects>
    -

    numpy data type (for primitive data types) or VLType/CompoundType/EnumType instance (for compound, vlen or enum data types)

    @@ -3597,10 +3575,9 @@

    In-memory (diskless) Datasets

    #   - shape + shape = <attribute 'shape' of 'netCDF4._netCDF4.Variable' objects>
    -

    find current sizes of all variable dimensions

    @@ -3609,10 +3586,9 @@

    In-memory (diskless) Datasets

    #   - size + size = <attribute 'size' of 'netCDF4._netCDF4.Variable' objects>
    -

    Return the number of stored elements.

    @@ -3621,10 +3597,9 @@

    In-memory (diskless) Datasets

    #   - dimensions + dimensions = <attribute 'dimensions' of 'netCDF4._netCDF4.Variable' objects>
    -

    get variables's dimension names

    @@ -3633,61 +3608,55 @@

    In-memory (diskless) Datasets

    #   - ndim + ndim = <attribute 'ndim' of 'netCDF4._netCDF4.Variable' objects>
    -
    #   - dtype + dtype = <attribute 'dtype' of 'netCDF4._netCDF4.Variable' objects>
    -
    #   - mask + mask = <attribute 'mask' of 'netCDF4._netCDF4.Variable' objects>
    -
    #   - scale + scale = <attribute 'scale' of 'netCDF4._netCDF4.Variable' objects>
    -
    #   - always_mask + always_mask = <attribute 'always_mask' of 'netCDF4._netCDF4.Variable' objects>
    -
    #   - chartostring + chartostring = <attribute 'chartostring' of 'netCDF4._netCDF4.Variable' objects>
    -
    @@ -3700,7 +3669,7 @@

    In-memory (diskless) Datasets

    Dimension:
    - +

    A netCDF Dimension is used to describe the coordinates of a Variable. See Dimension.__init__ for more details.

    @@ -3726,7 +3695,7 @@

    In-memory (diskless) Datasets

    Dimension()
    - +

    __init__(self, group, name, size=None)

    Dimension constructor.

    @@ -3752,7 +3721,7 @@

    In-memory (diskless) Datasets

    group(unknown):
    - +

    group(self)

    return the group that this Dimension is a member of.

    @@ -3768,7 +3737,7 @@

    In-memory (diskless) Datasets

    isunlimited(unknown):
    - +

    isunlimited(self)

    returns True if the Dimension instance is unlimited, False otherwise.

    @@ -3779,10 +3748,9 @@

    In-memory (diskless) Datasets

    #   - name + name = <attribute 'name' of 'netCDF4._netCDF4.Dimension' objects>
    -

    string name of Dimension instance

    @@ -3791,10 +3759,9 @@

    In-memory (diskless) Datasets

    #   - size + size = <attribute 'size' of 'netCDF4._netCDF4.Dimension' objects>
    -

    current size of Dimension (calls len on Dimension instance)

    @@ -3810,7 +3777,7 @@

    In-memory (diskless) Datasets

    Group(netCDF4.Dataset):
    - +

    Groups define a hierarchical namespace within a netCDF file. They are analogous to directories in a unix filesystem. Each Group behaves like a Dataset within a Dataset, and can contain it's own variables, @@ -3834,7 +3801,7 @@

    In-memory (diskless) Datasets

    Group()
    - +

    __init__(self, parent, name) Group constructor.

    @@ -3858,7 +3825,7 @@

    In-memory (diskless) Datasets

    close(unknown):
    - +

    close(self)

    overrides Dataset close method which does not apply to Group @@ -3928,7 +3895,7 @@

    Inherited Members
    MFDataset(netCDF4.Dataset):
    - +

    Class for reading multi-file netCDF Datasets, making variables spanning multiple files appear as if they were in one file. Datasets must be in NETCDF4_CLASSIC, NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET @@ -3938,7 +3905,7 @@

    Inherited Members

    Example usage (See MFDataset.__init__ for more details):

    -
    >>> import numpy as np
    +
    >>> import numpy as np
     >>> # create a series of netCDF files with a variable sharing
     >>> # the same unlimited dimension.
     >>> for nf in range(10):
    @@ -3965,7 +3932,7 @@ 
    Inherited Members
    MFDataset(files, check=False, aggdim=None, exclude=[], master_file=None)
    - +

    __init__(self, files, check=False, aggdim=None, exclude=[], master_file=None)

    @@ -4010,7 +3977,7 @@
    Inherited Members
    ncattrs(self):
    - +

    ncattrs(self)

    return the netcdf attribute names from the master file.

    @@ -4026,7 +3993,7 @@
    Inherited Members
    close(self):
    - +

    close(self)

    close all the open files.

    @@ -4094,13 +4061,13 @@
    Inherited Members
    MFTime(netCDF4._netCDF4._Variable):
    - +

    Class providing an interface to a MFDataset time Variable by imposing a unique common time unit and/or calendar to all files.

    Example usage (See MFTime.__init__ for more details):

    -
    >>> import numpy as np
    +
    >>> import numpy as np
     >>> f1 = Dataset("mftest_1.nc","w", format="NETCDF4_CLASSIC")
     >>> f2 = Dataset("mftest_2.nc","w", format="NETCDF4_CLASSIC")
     >>> f1.createDimension("time",None)
    @@ -4136,7 +4103,7 @@ 
    Inherited Members
    MFTime(time, units=None, calendar=None)
    - +

    __init__(self, time, units=None, calendar=None)

    Create a time Variable with units consistent across a multifile @@ -4180,7 +4147,7 @@

    Inherited Members
    CompoundType:
    - +

    A CompoundType instance is used to describe a compound data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. @@ -4199,7 +4166,7 @@

    Inherited Members
    CompoundType()
    - +

    __init__(group, datatype, datatype_name)

    CompoundType constructor.

    @@ -4228,31 +4195,28 @@
    Inherited Members
    #   - dtype + dtype = <attribute 'dtype' of 'netCDF4._netCDF4.CompoundType' objects>
    -
    #   - dtype_view + dtype_view = <attribute 'dtype_view' of 'netCDF4._netCDF4.CompoundType' objects>
    -
    #   - name + name = <attribute 'name' of 'netCDF4._netCDF4.CompoundType' objects>
    -
    @@ -4265,7 +4229,7 @@
    Inherited Members
    VLType:
    - +

    A VLType instance is used to describe a variable length (VLEN) data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. See @@ -4283,7 +4247,7 @@

    Inherited Members
    VLType()
    - +

    __init__(group, datatype, datatype_name)

    VLType constructor.

    @@ -4306,21 +4270,19 @@
    Inherited Members
    #   - dtype + dtype = <attribute 'dtype' of 'netCDF4._netCDF4.VLType' objects>
    -
    #   - name + name = <attribute 'name' of 'netCDF4._netCDF4.VLType' objects>
    -
    @@ -4332,7 +4294,7 @@
    Inherited Members
    date2num(unknown):
    - +

    date2num(dates, units, calendar=None, has_year_zero=None)

    Return numeric time values given datetime objects. The units @@ -4392,7 +4354,7 @@

    Inherited Members
    num2date(unknown):
    - +

    num2date(times, units, calendar=u'standard', only_use_cftime_datetimes=True, only_use_python_datetimes=False, has_year_zero=None)

    Return datetime objects given numeric time values. The units @@ -4464,7 +4426,7 @@

    Inherited Members
    date2index(unknown):
    - +

    date2index(dates, nctime, calendar=None, select=u'exact', has_year_zero=None)

    Return indices of a netCDF time variable corresponding to the given dates.

    @@ -4518,7 +4480,7 @@
    Inherited Members
    stringtochar(unknown):
    - +

    stringtochar(a,encoding='utf-8')

    convert a string array to a character array with one extra dimension

    @@ -4545,7 +4507,7 @@
    Inherited Members
    chartostring(unknown):
    - +

    chartostring(b,encoding='utf-8')

    convert a character array to a string array with one less dimension.

    @@ -4572,7 +4534,7 @@
    Inherited Members
    stringtoarr(unknown):
    - +

    stringtoarr(a, NUMCHARS,dtype='S')

    convert a string to a character array of length NUMCHARS

    @@ -4600,7 +4562,7 @@
    Inherited Members
    getlibversion(unknown):
    - +

    getlibversion()

    returns a string describing the version of the netcdf library @@ -4618,7 +4580,7 @@

    Inherited Members
    EnumType:
    - +

    A EnumType instance is used to describe an Enum data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. See @@ -4636,7 +4598,7 @@

    Inherited Members
    EnumType()
    - +

    __init__(group, datatype, datatype_name, enum_dict)

    EnumType constructor.

    @@ -4662,31 +4624,28 @@
    Inherited Members
    #   - dtype + dtype = <attribute 'dtype' of 'netCDF4._netCDF4.EnumType' objects>
    -
    #   - name + name = <attribute 'name' of 'netCDF4._netCDF4.EnumType' objects>
    -
    #   - enum_dict + enum_dict = <attribute 'enum_dict' of 'netCDF4._netCDF4.EnumType' objects>
    -
    @@ -4698,7 +4657,7 @@
    Inherited Members
    get_chunk_cache(unknown):
    - +

    get_chunk_cache()

    return current netCDF chunk cache information in a tuple (size,nelems,preemption). @@ -4716,7 +4675,7 @@

    Inherited Members
    set_chunk_cache(unknown):
    - +

    set_chunk_cache(self,size=None,nelems=None,preemption=None)

    change netCDF4 chunk cache settings. diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 278f21224..b9897aa34 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -2654,13 +2654,13 @@ datatype.""" compression=None, zlib=False, complevel=4, shuffle=True, szip_coding='nn',szip_pixels_per_block=8, - blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, + blosc_shuffle=1,fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None,quantize_mode='BitGroom',fill_value=None, chunk_cache=None): """ **`createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, -szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, blosc_blocksize=0, +szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, endian='native', least_significant_digit=None, significant_digits=None, quantize_mode='BitGroom', fill_value=None, chunk_cache=None)`** @@ -2714,11 +2714,9 @@ will be applied before compressing the data with zlib (default `True`). This significantly improves compression. Default is `True`. Ignored if `zlib=False`. -The optional kwargs `blosc_shuffle` and `blosc_blocksize` are ignored +The optional kwarg `blosc_shuffle`is ignored unless the blosc compressor is used. `blosc_shuffle` can be 0 (no shuffle), -1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 1. `blosc_blocksize` -is the tunable blosc blocksize in bytes (Default 0 means the blocksize is -chosen internally). +1 (byte-wise shuffle) or 2 (bit-wise shuffle). Default is 1. The optional kwargs `szip_coding` and `szip_pixels_per_block` are ignored unless the szip compressor is used. `szip_coding` can be `ec` (entropy coding) @@ -2834,7 +2832,7 @@ is the number of variable dimensions.""" group.variables[varname] = Variable(group, varname, datatype, dimensions=dimensions, compression=compression, zlib=zlib, complevel=complevel, shuffle=shuffle, szip_coding=szip_coding, szip_pixels_per_block=szip_pixels_per_block, - blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize, + blosc_shuffle=blosc_shuffle, fletcher32=fletcher32, contiguous=contiguous, chunksizes=chunksizes, endian=endian, least_significant_digit=least_significant_digit, significant_digits=significant_digits,quantize_mode=quantize_mode,fill_value=fill_value, chunk_cache=chunk_cache) @@ -3676,14 +3674,14 @@ behavior is similar to Fortran or Matlab, but different than numpy. def __init__(self, grp, name, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, szip_coding='nn', szip_pixels_per_block=8, - blosc_shuffle=1, blosc_blocksize=0, + blosc_shuffle=1, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None, significant_digits=None,quantize_mode='BitGroom',fill_value=None, chunk_cache=None, **kwargs): """ **`__init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, szip_coding='nn', szip_pixels_per_block=8, - blosc_shuffle=1, blosc_blocksize=0, fletcher32=False, contiguous=False, + blosc_shuffle=1, fletcher32=False, contiguous=False, chunksizes=None, endian='native', least_significant_digit=None,fill_value=None,chunk_cache=None)`** @@ -3738,10 +3736,6 @@ behavior is similar to Fortran or Matlab, but different than numpy. Can be 0 (no blosc shuffle), 1 (bytewise shuffle) or 2 (bitwise shuffle)). Default is 1. Ignored if blosc compressor not used. - **`blosc_blocksize`**: tunable blocksize in bytes for blosc - compressors. Default of 0 means blosc library chooses a blocksize. - Ignored if blosc compressor not used. - **`szip_coding`**: szip coding method. Can be `ec` (entropy coding) or `nn` (nearest neighbor coding). Default is `nn`. Ignored if szip compressor not used. @@ -4045,7 +4039,7 @@ version 4.9.0 or higher netcdf-c with bzip2 support, and rebuild netcdf4-python. IF HAS_BLOSC_SUPPORT: iblosc_compressor = _blosc_dict[compression] iblosc_shuffle = blosc_shuffle - iblosc_blocksize = blosc_blocksize + iblosc_blocksize = 0 # not currently used by c lib iblosc_complevel = complevel ierr = nc_def_var_blosc(self._grpid, self._varid,\ iblosc_compressor,\ @@ -4492,7 +4486,7 @@ return dictionary containing HDF5 filter parameters.""" filtdict['complevel']=icomplevel_bzip2 if iblosc: blosc_compressor = iblosc_compressor - filtdict['blosc']={'compressor':_blosc_dict_inv[blosc_compressor],'shuffle':iblosc_shuffle,'blocksize':iblosc_blocksize} + filtdict['blosc']={'compressor':_blosc_dict_inv[blosc_compressor],'shuffle':iblosc_shuffle} filtdict['complevel']=iblosc_complevel if iszip: szip_coding = iszip_coding diff --git a/test/tst_compression_blosc.py b/test/tst_compression_blosc.py index 891f53b90..da31d3ce1 100644 --- a/test/tst_compression_blosc.py +++ b/test/tst_compression_blosc.py @@ -5,26 +5,25 @@ ndim = 100000 iblosc_shuffle=2 -iblosc_blocksize=800000 iblosc_complevel=4 filename = tempfile.NamedTemporaryFile(suffix='.nc', delete=False).name datarr = uniform(size=(ndim,)) -def write_netcdf(filename,dtype='f8',blosc_shuffle=1,blosc_blocksize=500000,complevel=6): +def write_netcdf(filename,dtype='f8',blosc_shuffle=1,complevel=6): nc = Dataset(filename,'w') nc.createDimension('n', ndim) foo = nc.createVariable('data',\ dtype,('n'),compression=None) foo_lz = nc.createVariable('data_lz',\ - dtype,('n'),compression='blosc_lz',blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize,complevel=complevel) + dtype,('n'),compression='blosc_lz',blosc_shuffle=blosc_shuffle,complevel=complevel) foo_lz4 = nc.createVariable('data_lz4',\ - dtype,('n'),compression='blosc_lz4',blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize,complevel=complevel) + dtype,('n'),compression='blosc_lz4',blosc_shuffle=blosc_shuffle,complevel=complevel) foo_lz4hc = nc.createVariable('data_lz4hc',\ - dtype,('n'),compression='blosc_lz4hc',blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize,complevel=complevel) + dtype,('n'),compression='blosc_lz4hc',blosc_shuffle=blosc_shuffle,complevel=complevel) foo_zlib = nc.createVariable('data_zlib',\ - dtype,('n'),compression='blosc_zlib',blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize,complevel=complevel) + dtype,('n'),compression='blosc_zlib',blosc_shuffle=blosc_shuffle,complevel=complevel) foo_zstd = nc.createVariable('data_zstd',\ - dtype,('n'),compression='blosc_zstd',blosc_shuffle=blosc_shuffle,blosc_blocksize=blosc_blocksize,complevel=complevel) + dtype,('n'),compression='blosc_zstd',blosc_shuffle=blosc_shuffle,complevel=complevel) foo_lz[:] = datarr foo_lz4[:] = datarr foo_lz4hc[:] = datarr @@ -36,7 +35,7 @@ class CompressionTestCase(unittest.TestCase): def setUp(self): self.filename = filename - write_netcdf(self.filename,complevel=iblosc_complevel,blosc_shuffle=iblosc_shuffle,blosc_blocksize=iblosc_blocksize) # with compression + write_netcdf(self.filename,complevel=iblosc_complevel,blosc_shuffle=iblosc_shuffle) def tearDown(self): # Remove the temporary files @@ -49,27 +48,27 @@ def runTest(self): {'zlib':False,'szip':False,'zstd':False,'bzip2':False,'blosc':False,'shuffle':False,'complevel':0,'fletcher32':False} assert_almost_equal(datarr,f.variables['data_lz'][:]) dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': - {'compressor': 'blosc_lz', 'shuffle': iblosc_shuffle, 'blocksize': iblosc_blocksize}, + {'compressor': 'blosc_lz', 'shuffle': iblosc_shuffle}, 'shuffle': False, 'complevel': iblosc_complevel, 'fletcher32': False} assert f.variables['data_lz'].filters() == dtest assert_almost_equal(datarr,f.variables['data_lz4'][:]) dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': - {'compressor': 'blosc_lz4', 'shuffle': iblosc_shuffle, 'blocksize': iblosc_blocksize}, + {'compressor': 'blosc_lz4', 'shuffle': iblosc_shuffle}, 'shuffle': False, 'complevel': iblosc_complevel, 'fletcher32': False} assert f.variables['data_lz4'].filters() == dtest assert_almost_equal(datarr,f.variables['data_lz4hc'][:]) dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': - {'compressor': 'blosc_lz4hc', 'shuffle': iblosc_shuffle, 'blocksize': iblosc_blocksize}, + {'compressor': 'blosc_lz4hc', 'shuffle': iblosc_shuffle}, 'shuffle': False, 'complevel': iblosc_complevel, 'fletcher32': False} assert f.variables['data_lz4hc'].filters() == dtest assert_almost_equal(datarr,f.variables['data_zlib'][:]) dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': - {'compressor': 'blosc_zlib', 'shuffle': iblosc_shuffle, 'blocksize': iblosc_blocksize}, + {'compressor': 'blosc_zlib', 'shuffle': iblosc_shuffle}, 'shuffle': False, 'complevel': iblosc_complevel, 'fletcher32': False} assert f.variables['data_zlib'].filters() == dtest assert_almost_equal(datarr,f.variables['data_zstd'][:]) dtest = {'zlib': False, 'szip':False, 'zstd': False, 'bzip2': False, 'blosc': - {'compressor': 'blosc_zstd', 'shuffle': iblosc_shuffle, 'blocksize': iblosc_blocksize}, + {'compressor': 'blosc_zstd', 'shuffle': iblosc_shuffle}, 'shuffle': False, 'complevel': iblosc_complevel, 'fletcher32': False} assert f.variables['data_zstd'].filters() == dtest f.close() From 9d2faeb84570ffa1713d8dba1dde310b9b370b35 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 15 May 2022 08:44:16 -0600 Subject: [PATCH 69/92] update --- Changelog | 10 +++++++--- README.md | 4 ++-- 2 files changed, 9 insertions(+), 5 deletions(-) diff --git a/Changelog b/Changelog index eebb7f378..435a6bdc6 100644 --- a/Changelog +++ b/Changelog @@ -14,11 +14,15 @@ * add 'compression' kwarg to createVariable to enable new compression functionality in netcdf-c 4.9.0. 'None','zlib','szip','zstd','bzip2' 'blosc_lz','blosc_lz4','blosc_lz4hc','blosc_zlib' and 'blosc_zstd' - are currently supported. 'blosc_shuffle', 'blosc_blocksize', + are currently supported. 'blosc_shuffle', 'szip_mask' and 'szip_pixels_per_block' kwargs also added. compression='zlib' is equivalent to (the now deprecated) zlib=True. - Using new compressors (except 'szip') requires setting HDF5_PLUGIN_PATH to point to - the installation location of the netcdf-c filter plugins. + If the environment variable NETCDF_PLUGIN_DIR is set to point to the + directory with the HDF5 plugin .so files, then the compression plugins will + be installed within the package and be automatically available (the binary + wheels have this). Otherwise, the environment variable HDF5_PLUGIN_PATH + needs to be se to point to plugins in order to use the new compression + options. * MFDataset did not aggregate 'name' variable attribute (issue #1153). * issue warning instead of raising an exception if missing_value or _FillValue can't be cast to the variable type when creating a diff --git a/README.md b/README.md index 7615d10d1..453648b80 100644 --- a/README.md +++ b/README.md @@ -13,8 +13,8 @@ For details on the latest updates, see the [Changelog](https://github.com/Unidat ??/??/2022: Version [1.6.0](https://pypi.python.org/pypi/netCDF4/1.6.0) released. Support for quantization (bit-grooming and bit-rounding) functionality in netcdf-c 4.9.0 which can dramatically improve compression. Dataset.createVariable now accepts dimension instances (instead -of just dimension names). 'compression' kwarg added to Dataset.createVariable to support szip as well as -new compression algorithms available in netcdf-c 4.9.0 through HDF5 filter plugsins (such +of just dimension names). 'compression' kwarg added to Dataset.createVariable to support szip as +well as new compression algorithms available in netcdf-c 4.9.0 through HDF5 filter plugsins (such as zstd, bzip2 and blosc). Working arm64 wheels for Apple M1 Silicon now available on pypi. 10/31/2021: Version [1.5.8](https://pypi.python.org/pypi/netCDF4/1.5.8) released. Fix Enum bug, add binary wheels for aarch64 and python 3.10. From fa5791c0c9c51441e8fafd8cdb07793243d59ab6 Mon Sep 17 00:00:00 2001 From: jswhit Date: Mon, 16 May 2022 11:08:23 -0600 Subject: [PATCH 70/92] update --- src/netCDF4/__init__.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/netCDF4/__init__.py b/src/netCDF4/__init__.py index f53195897..b6f9d2293 100644 --- a/src/netCDF4/__init__.py +++ b/src/netCDF4/__init__.py @@ -13,6 +13,6 @@ import os __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] -# if HDF5_PLUGIN_PATH not set, point to plugins directory inside package -if 'HDF5_PLUGIN_PATH' not in os.environ: +# if HDF5_PLUGIN_PATH not set, point to package path if libh5noop.so exists there +if 'HDF5_PLUGIN_PATH' not in os.environ and os.path.exists(os.path.join(__path__[0],'libh5noop.so')): os.environ['HDF5_PLUGIN_PATH']=__path__[0] From 09078033a6b22f621e9f714c3a94474685a64a61 Mon Sep 17 00:00:00 2001 From: jswhit Date: Mon, 16 May 2022 11:19:15 -0600 Subject: [PATCH 71/92] update docs --- docs/index.html | 443 +++++++++++++++++++++------------------ src/netCDF4/_netCDF4.pyx | 6 + 2 files changed, 248 insertions(+), 201 deletions(-) diff --git a/docs/index.html b/docs/index.html index c923bcdb4..9b62f5b05 100644 --- a/docs/index.html +++ b/docs/index.html @@ -3,24 +3,24 @@ - + netCDF4 API documentation - - - - - - -

    @@ -520,6 +521,12 @@

    Developer Install

    If the dependencies are not found in any of the paths specified by environment variables, then standard locations (such as /usr and /usr/local) are searched.

  • +
  • if the env var NETCDF_PLUGIN_DIR is set to point to the location netcdf-c compression +plugin shared objects, they will be installed inside the package. In this +case HDF5_PLUGIN_PATH will be set to the package installation path on import, +so the extra compression algorithms available in netcdf-c 4.9.0 will automatically +be available. Otherwise, the user will have to set HDF5_PLUGIN_PATH explicitly +to have access to the extra compression plugins.
  • run python setup.py build, then python setup.py install (as root if necessary).
  • run the tests in the 'test' directory by running python run_all.py.
  • @@ -576,7 +583,7 @@

    Creating/Opening/Closing a netCDF

    Here's an example:

    -
    >>> from netCDF4 import Dataset
    +
    >>> from netCDF4 import Dataset
     >>> rootgrp = Dataset("test.nc", "w", format="NETCDF4")
     >>> print(rootgrp.data_model)
     NETCDF4
    @@ -605,7 +612,7 @@ 

    Groups in a netCDF file

    NETCDF4 formatted files support Groups, if you try to create a Group in a netCDF 3 file you will get an error message.

    -
    >>> rootgrp = Dataset("test.nc", "a")
    +
    >>> rootgrp = Dataset("test.nc", "a")
     >>> fcstgrp = rootgrp.createGroup("forecasts")
     >>> analgrp = rootgrp.createGroup("analyses")
     >>> print(rootgrp.groups)
    @@ -629,7 +636,7 @@ 

    Groups in a netCDF file

    that group. To simplify the creation of nested groups, you can use a unix-like path as an argument to Dataset.createGroup.

    -
    >>> fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
    +
    >>> fcstgrp1 = rootgrp.createGroup("/forecasts/model1")
     >>> fcstgrp2 = rootgrp.createGroup("/forecasts/model2")
     
    @@ -643,7 +650,7 @@

    Groups in a netCDF file

    to walk the directory tree. Note that printing the Dataset or Group object yields summary information about it's contents.

    -
    >>> def walktree(top):
    +
    >>> def walktree(top):
     ...     yield top.groups.values()
     ...     for value in top.groups.values():
     ...         yield from walktree(value)
    @@ -693,7 +700,7 @@ 

    Dimensions in a netCDF file

    dimension is a new netCDF 4 feature, in netCDF 3 files there may be only one, and it must be the first (leftmost) dimension of the variable.

    -
    >>> level = rootgrp.createDimension("level", None)
    +
    >>> level = rootgrp.createDimension("level", None)
     >>> time = rootgrp.createDimension("time", None)
     >>> lat = rootgrp.createDimension("lat", 73)
     >>> lon = rootgrp.createDimension("lon", 144)
    @@ -701,7 +708,7 @@ 

    Dimensions in a netCDF file

    All of the Dimension instances are stored in a python dictionary.

    -
    >>> print(rootgrp.dimensions)
    +
    >>> print(rootgrp.dimensions)
     {'level': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0, 'time': <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0, 'lat': <class 'netCDF4._netCDF4.Dimension'>: name = 'lat', size = 73, 'lon': <class 'netCDF4._netCDF4.Dimension'>: name = 'lon', size = 144}
     
    @@ -710,7 +717,7 @@

    Dimensions in a netCDF file

    Dimension.isunlimited method of a Dimension instance be used to determine if the dimensions is unlimited, or appendable.

    -
    >>> print(len(lon))
    +
    >>> print(len(lon))
     144
     >>> print(lon.isunlimited())
     False
    @@ -722,7 +729,7 @@ 

    Dimensions in a netCDF file

    provides useful summary info, including the name and length of the dimension, and whether it is unlimited.

    -
    >>> for dimobj in rootgrp.dimensions.values():
    +
    >>> for dimobj in rootgrp.dimensions.values():
     ...     print(dimobj)
     <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'level', size = 0
     <class 'netCDF4._netCDF4.Dimension'> (unlimited): name = 'time', size = 0
    @@ -767,7 +774,7 @@ 

    Variables in a netCDF file

    method returns an instance of the Variable class whose methods can be used later to access and set variable data and attributes.

    -
    >>> times = rootgrp.createVariable("time","f8",("time",))
    +
    >>> times = rootgrp.createVariable("time","f8",("time",))
     >>> levels = rootgrp.createVariable("level","i4",("level",))
     >>> latitudes = rootgrp.createVariable("lat","f4",("lat",))
     >>> longitudes = rootgrp.createVariable("lon","f4",("lon",))
    @@ -779,7 +786,7 @@ 

    Variables in a netCDF file

    To get summary info on a Variable instance in an interactive session, just print it.

    -
    >>> print(temp)
    +
    >>> print(temp)
     <class 'netCDF4._netCDF4.Variable'>
     float32 temp(time, level, lat, lon)
         units: K
    @@ -790,7 +797,7 @@ 

    Variables in a netCDF file

    You can use a path to create a Variable inside a hierarchy of groups.

    -
    >>> ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))
    +
    >>> ftemp = rootgrp.createVariable("/forecasts/model1/temp","f4",("time","level","lat","lon",))
     

    If the intermediate groups do not yet exist, they will be created.

    @@ -798,7 +805,7 @@

    Variables in a netCDF file

    You can also query a Dataset or Group instance directly to obtain Group or Variable instances using paths.

    -
    >>> print(rootgrp["/forecasts/model1"])  # a Group instance
    +
    >>> print(rootgrp["/forecasts/model1"])  # a Group instance
     <class 'netCDF4._netCDF4.Group'>
     group /forecasts/model1:
         dimensions(sizes): 
    @@ -816,7 +823,7 @@ 

    Variables in a netCDF file

    All of the variables in the Dataset or Group are stored in a Python dictionary, in the same way as the dimensions:

    -
    >>> print(rootgrp.variables)
    +
    >>> print(rootgrp.variables)
     {'time': <class 'netCDF4._netCDF4.Variable'>
     float64 time(time)
     unlimited dimensions: time
    @@ -859,7 +866,7 @@ 

    Attributes in a netCDF file

    variables. Attributes can be strings, numbers or sequences. Returning to our example,

    -
    >>> import time
    +
    >>> import time
     >>> rootgrp.description = "bogus example script"
     >>> rootgrp.history = "Created " + time.ctime(time.time())
     >>> rootgrp.source = "netCDF4 python module tutorial"
    @@ -877,7 +884,7 @@ 

    Attributes in a netCDF file

    built-in dir Python function will return a bunch of private methods and attributes that cannot (or should not) be modified by the user.

    -
    >>> for name in rootgrp.ncattrs():
    +
    >>> for name in rootgrp.ncattrs():
     ...     print("Global attr {} = {}".format(name, getattr(rootgrp, name)))
     Global attr description = bogus example script
     Global attr history = Created Mon Jul  8 14:19:41 2019
    @@ -888,7 +895,7 @@ 

    Attributes in a netCDF file

    instance provides all the netCDF attribute name/value pairs in a python dictionary:

    -
    >>> print(rootgrp.__dict__)
    +
    >>> print(rootgrp.__dict__)
     {'description': 'bogus example script', 'history': 'Created Mon Jul  8 14:19:41 2019', 'source': 'netCDF4 python module tutorial'}
     
    @@ -901,7 +908,7 @@

    Writing data

    Now that you have a netCDF Variable instance, how do you put data into it? You can just treat it like an array and assign data to a slice.

    -
    >>> import numpy as np
    +
    >>> import numpy as np
     >>> lats =  np.arange(-90,91,2.5)
     >>> lons =  np.arange(-180,180,2.5)
     >>> latitudes[:] = lats
    @@ -921,7 +928,7 @@ 

    Writing data objects with unlimited dimensions will grow along those dimensions if you assign data outside the currently defined range of indices.

    -
    >>> # append along two unlimited dimensions by assigning to slice.
    +
    >>> # append along two unlimited dimensions by assigning to slice.
     >>> nlats = len(rootgrp.dimensions["lat"])
     >>> nlons = len(rootgrp.dimensions["lon"])
     >>> print("temp shape before adding data = {}".format(temp.shape))
    @@ -941,7 +948,7 @@ 

    Writing data along the level dimension of the variable temp, even though no data has yet been assigned to levels.

    -
    >>> # now, assign data to levels dimension variable.
    +
    >>> # now, assign data to levels dimension variable.
     >>> levels[:] =  [1000.,850.,700.,500.,300.,250.,200.,150.,100.,50.]
     
    @@ -954,7 +961,7 @@

    Writing data allowed, and these indices work independently along each dimension (similar to the way vector subscripts work in fortran). This means that

    -
    >>> temp[0, 0, [0,1,2,3], [0,1,2,3]].shape
    +
    >>> temp[0, 0, [0,1,2,3], [0,1,2,3]].shape
     (4, 4)
     
    @@ -972,14 +979,14 @@

    Writing data

    For example,

    -
    >>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]
    +
    >>> tempdat = temp[::2, [1,3,6], lats>0, lons>0]
     

    will extract time indices 0,2 and 4, pressure levels 850, 500 and 200 hPa, all Northern Hemisphere latitudes and Eastern Hemisphere longitudes, resulting in a numpy array of shape (3, 3, 36, 71).

    -
    >>> print("shape of fancy temp slice = {}".format(tempdat.shape))
    +
    >>> print("shape of fancy temp slice = {}".format(tempdat.shape))
     shape of fancy temp slice = (3, 3, 36, 71)
     
    @@ -1012,7 +1019,7 @@

    Dealing with time coordinates

    provided by cftime to do just that. Here's an example of how they can be used:

    -
    >>> # fill in times.
    +
    >>> # fill in times.
     >>> from datetime import datetime, timedelta
     >>> from cftime import num2date, date2num
     >>> dates = [datetime(2001,3,1)+n*timedelta(hours=12) for n in range(temp.shape[0])]
    @@ -1052,7 +1059,7 @@ 

    Reading data from a multi NETCDF4_CLASSIC format (NETCDF4 formatted multi-file datasets are not supported).

    -
    >>> for nf in range(10):
    +
    >>> for nf in range(10):
     ...     with Dataset("mftest%s.nc" % nf, "w", format="NETCDF4_CLASSIC") as f:
     ...         _ = f.createDimension("x",None)
     ...         x = f.createVariable("x","i",("x",))
    @@ -1061,7 +1068,7 @@ 

    Reading data from a multi

    Now read all the files back in at once with MFDataset

    -
    >>> from netCDF4 import MFDataset
    +
    >>> from netCDF4 import MFDataset
     >>> f = MFDataset("mftest*nc")
     >>> print(f.variables["x"][:])
     [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
    @@ -1128,22 +1135,22 @@ 

    Efficient compression of netC

    In our example, try replacing the line

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",))
     

    with

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib')
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib')
     

    and then

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',least_significant_digit=3)
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',least_significant_digit=3)
     

    or with netcdf-c >= 4.9.0

    -
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',significant_digits=4)
    +
    >>> temp = rootgrp.createVariable("temp","f4",("time","level","lat","lon",),compression='zlib',significant_digits=4)
     

    and see how much smaller the resulting files are.

    @@ -1164,7 +1171,7 @@

    Beyond ho Since there is no native complex data type in netcdf, compound types are handy for storing numpy complex arrays. Here's an example:

    -
    >>> f = Dataset("complex.nc","w")
    +
    >>> f = Dataset("complex.nc","w")
     >>> size = 3 # length of 1-d complex array
     >>> # create sample complex data.
     >>> datac = np.exp(1j*(1.+np.linspace(0, np.pi, size)))
    @@ -1200,7 +1207,7 @@ 

    Beyond ho in a Python dictionary, just like variables and dimensions. As always, printing objects gives useful summary information in an interactive session:

    -
    >>> print(f)
    +
    >>> print(f)
     <class 'netCDF4._netCDF4.Dataset'>
     root group (NETCDF4 data model, file format HDF5):
         dimensions(sizes): x_dim(3)
    @@ -1225,7 +1232,7 @@ 

    Variable-length (vlen) data types

    data type, use the Dataset.createVLType method method of a Dataset or Group instance.

    -
    >>> f = Dataset("tst_vlen.nc","w")
    +
    >>> f = Dataset("tst_vlen.nc","w")
     >>> vlen_t = f.createVLType(np.int32, "phony_vlen")
     
    @@ -1235,7 +1242,7 @@

    Variable-length (vlen) data types

    but compound data types cannot. A new variable can then be created using this datatype.

    -
    >>> x = f.createDimension("x",3)
    +
    >>> x = f.createDimension("x",3)
     >>> y = f.createDimension("y",4)
     >>> vlvar = f.createVariable("phony_vlen_var", vlen_t, ("y","x"))
     
    @@ -1248,7 +1255,7 @@

    Variable-length (vlen) data types

    In this case, they contain 1-D numpy int32 arrays of random length between 1 and 10.

    -
    >>> import random
    +
    >>> import random
     >>> random.seed(54321)
     >>> data = np.empty(len(y)*len(x),object)
     >>> for n in range(len(y)*len(x)):
    @@ -1288,7 +1295,7 @@ 

    Variable-length (vlen) data types

    with fixed length greater than 1) when calling the Dataset.createVariable method.

    -
    >>> z = f.createDimension("z",10)
    +
    >>> z = f.createDimension("z",10)
     >>> strvar = f.createVariable("strvar", str, "z")
     
    @@ -1296,7 +1303,7 @@

    Variable-length (vlen) data types

    random lengths between 2 and 12 characters, and the data in the object array is assigned to the vlen string variable.

    -
    >>> chars = "1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
    +
    >>> chars = "1234567890aabcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
     >>> data = np.empty(10,"O")
     >>> for n in range(10):
     ...     stringlen = random.randint(2,12)
    @@ -1335,7 +1342,7 @@ 

    Enum data type

    values and their names are used to define an Enum data type using Dataset.createEnumType.

    -
    >>> nc = Dataset('clouds.nc','w')
    +
    >>> nc = Dataset('clouds.nc','w')
     >>> # python dict with allowed values and their names.
     >>> enum_dict = {'Altocumulus': 7, 'Missing': 255,
     ... 'Stratus': 2, 'Clear': 0,
    @@ -1353,7 +1360,7 @@ 

    Enum data type

    is made to write an integer value not associated with one of the specified names.

    -
    >>> time = nc.createDimension('time',None)
    +
    >>> time = nc.createDimension('time',None)
     >>> # create a 1d variable of type 'cloud_type'.
     >>> # The fill_value is set to the 'Missing' named value.
     >>> cloud_var = nc.createVariable('primary_cloud',cloud_type,'time',
    @@ -1390,7 +1397,7 @@ 

    Parallel IO

    available. To use parallel IO, your program must be running in an MPI environment using mpi4py.

    -
    >>> from mpi4py import MPI
    +
    >>> from mpi4py import MPI
     >>> import numpy as np
     >>> from netCDF4 import Dataset
     >>> rank = MPI.COMM_WORLD.rank  # The process ID (integer 0-3 for 4-process run)
    @@ -1402,7 +1409,7 @@ 

    Parallel IO

    when a new dataset is created or an existing dataset is opened, use the parallel keyword to enable parallel access.

    -
    >>> nc = Dataset('parallel_test.nc','w',parallel=True)
    +
    >>> nc = Dataset('parallel_test.nc','w',parallel=True)
     

    The optional comm keyword may be used to specify a particular @@ -1410,7 +1417,7 @@

    Parallel IO

    can now write to the file indepedently. In this example the process rank is written to a different variable index on each task

    -
    >>> d = nc.createDimension('dim',4)
    +
    >>> d = nc.createDimension('dim',4)
     >>> v = nc.createVariable('var', np.int64, 'dim')
     >>> v[rank] = rank
     >>> nc.close()
    @@ -1477,7 +1484,7 @@ 

    Dealing with strings

    stringtochar is used to convert the numpy string array to an array of characters with one more dimension. For example,

    -
    >>> from netCDF4 import stringtochar
    +
    >>> from netCDF4 import stringtochar
     >>> nc = Dataset('stringtest.nc','w',format='NETCDF4_CLASSIC')
     >>> _ = nc.createDimension('nchars',3)
     >>> _ = nc.createDimension('nstrings',None)
    @@ -1510,7 +1517,7 @@ 

    Dealing with strings

    character array dtype under the hood when creating the netcdf compound type. Here's an example:

    -
    >>> nc = Dataset('compoundstring_example.nc','w')
    +
    >>> nc = Dataset('compoundstring_example.nc','w')
     >>> dtype = np.dtype([('observation', 'f4'),
     ...                      ('station_name','S10')])
     >>> station_data_t = nc.createCompoundType(dtype,'station_data')
    @@ -1555,7 +1562,7 @@ 

    In-memory (diskless) Datasets

    object representing the Dataset. Below are examples illustrating both approaches.

    -
    >>> # create a diskless (in-memory) Dataset,
    +
    >>> # create a diskless (in-memory) Dataset,
     >>> # and persist the file to disk when it is closed.
     >>> nc = Dataset('diskless_example.nc','w',diskless=True,persist=True)
     >>> d = nc.createDimension('x',None)
    @@ -1617,7 +1624,7 @@ 

    In-memory (diskless) Datasets

    the parallel IO example, which is in examples/mpi_example.py. Unit tests are in the test directory.

    -

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    +

    contact: Jeffrey Whitaker jeffrey.s.whitaker@noaa.gov

    copyright: 2008 by Jeffrey Whitaker.

    @@ -1630,7 +1637,7 @@

    In-memory (diskless) Datasets

    View Source -
    # init for netCDF4. package
    +            
    # init for netCDF4. package
     # Docstring comes from extension module _netCDF4.
     from ._netCDF4 import *
     # Need explicit imports for names beginning with underscores
    @@ -1645,8 +1652,8 @@ 

    In-memory (diskless) Datasets

    import os __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] -# if HDF5_PLUGIN_PATH not set, point to plugins directory inside package -if 'HDF5_PLUGIN_PATH' not in os.environ: +# if HDF5_PLUGIN_PATH not set, point to package path if libh5noop.so exists there +if 'HDF5_PLUGIN_PATH' not in os.environ and os.path.exists(os.path.join(__path__[0],'libh5noop.so')): os.environ['HDF5_PLUGIN_PATH']=__path__[0]
    @@ -1662,7 +1669,7 @@

    In-memory (diskless) Datasets

    Dataset:
    - +

    A netCDF Dataset is a collection of dimensions, groups, variables and attributes. Together they describe the meaning of data and relations among data fields stored in a netCDF file. See Dataset.__init__ for more @@ -1740,7 +1747,7 @@

    In-memory (diskless) Datasets

    Dataset()
    - +

    __init__(self, filename, mode="r", clobber=True, diskless=False, persist=False, keepweakref=False, memory=None, encoding=None, parallel=False, comm=None, info=None, format='NETCDF4')

    @@ -1846,7 +1853,7 @@

    In-memory (diskless) Datasets

    filepath(unknown):
    - +

    filepath(self,encoding=None)

    Get the file system path (or the opendap URL) which was used to @@ -1865,7 +1872,7 @@

    In-memory (diskless) Datasets

    close(unknown):
    - +

    close(self)

    Close the Dataset.

    @@ -1881,7 +1888,7 @@

    In-memory (diskless) Datasets

    isopen(unknown):
    - +

    isopen(self)

    Is the Dataset open or closed?

    @@ -1897,7 +1904,7 @@

    In-memory (diskless) Datasets

    sync(unknown):
    - +

    sync(self)

    Writes all buffered data in the Dataset to the disk file.

    @@ -1913,7 +1920,7 @@

    In-memory (diskless) Datasets

    set_fill_on(unknown):
    - +

    set_fill_on(self)

    Sets the fill mode for a Dataset open for writing to on.

    @@ -1937,7 +1944,7 @@

    In-memory (diskless) Datasets

    set_fill_off(unknown):
    - +

    set_fill_off(self)

    Sets the fill mode for a Dataset open for writing to off.

    @@ -1957,7 +1964,7 @@

    In-memory (diskless) Datasets

    createDimension(unknown):
    - +

    createDimension(self, dimname, size=None)

    Creates a new dimension with the given dimname and size.

    @@ -1981,7 +1988,7 @@

    In-memory (diskless) Datasets

    renameDimension(unknown):
    - +

    renameDimension(self, oldname, newname)

    rename a Dimension named oldname to newname.

    @@ -1997,7 +2004,7 @@

    In-memory (diskless) Datasets

    createCompoundType(unknown):
    - +

    createCompoundType(self, datatype, datatype_name)

    Creates a new compound data type named datatype_name from the numpy @@ -2022,7 +2029,7 @@

    In-memory (diskless) Datasets

    createVLType(unknown):
    - +

    createVLType(self, datatype, datatype_name)

    Creates a new VLEN data type named datatype_name from a numpy @@ -2042,7 +2049,7 @@

    In-memory (diskless) Datasets

    createEnumType(unknown):
    - +

    createEnumType(self, datatype, datatype_name, enum_dict)

    Creates a new Enum data type named datatype_name from a numpy @@ -2063,7 +2070,7 @@

    In-memory (diskless) Datasets

    createVariable(unknown):
    - +

    createVariable(self, varname, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, fletcher32=False, contiguous=False, chunksizes=None, szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, @@ -2158,7 +2165,7 @@

    In-memory (diskless) Datasets

    The optional keyword fill_value can be used to override the default netCDF _FillValue (the value that the variable gets filled with before -any data is written to it, defaults given in the dict netCDF4.default_fillvals). +any data is written to it, defaults given in the dict netCDF4.default_fillvals). If fill_value is set to False, then the variable is not pre-filled.

    If the optional keyword parameters least_significant_digit or significant_digits are @@ -2226,7 +2233,7 @@

    In-memory (diskless) Datasets

    renameVariable(unknown):
    - +

    renameVariable(self, oldname, newname)

    rename a Variable named oldname to newname

    @@ -2242,7 +2249,7 @@

    In-memory (diskless) Datasets

    createGroup(unknown):
    - +

    createGroup(self, groupname)

    Creates a new Group with the given groupname.

    @@ -2268,7 +2275,7 @@

    In-memory (diskless) Datasets

    ncattrs(unknown):
    - +

    ncattrs(self)

    return netCDF global attribute names for this Dataset or Group in a list.

    @@ -2284,7 +2291,7 @@

    In-memory (diskless) Datasets

    setncattr(unknown):
    - +

    setncattr(self,name,value)

    set a netCDF dataset or group attribute using name,value pair. @@ -2302,7 +2309,7 @@

    In-memory (diskless) Datasets

    setncattr_string(unknown):
    - +

    setncattr_string(self,name,value)

    set a netCDF dataset or group string attribute using name,value pair. @@ -2320,7 +2327,7 @@

    In-memory (diskless) Datasets

    setncatts(unknown):
    - +

    setncatts(self,attdict)

    set a bunch of netCDF dataset or group attributes at once using a python dictionary. @@ -2339,7 +2346,7 @@

    In-memory (diskless) Datasets

    getncattr(unknown):
    - +

    getncattr(self,name)

    retrieve a netCDF dataset or group attribute. @@ -2360,7 +2367,7 @@

    In-memory (diskless) Datasets

    delncattr(unknown):
    - +

    delncattr(self,name,value)

    delete a netCDF dataset or group attribute. Use if you need to delete a @@ -2378,7 +2385,7 @@

    In-memory (diskless) Datasets

    renameAttribute(unknown):
    - +

    renameAttribute(self, oldname, newname)

    rename a Dataset or Group attribute named oldname to newname.

    @@ -2394,7 +2401,7 @@

    In-memory (diskless) Datasets

    renameGroup(unknown):
    - +

    renameGroup(self, oldname, newname)

    rename a Group named oldname to newname (requires netcdf >= 4.3.1).

    @@ -2410,7 +2417,7 @@

    In-memory (diskless) Datasets

    set_auto_chartostring(unknown):
    - +

    set_auto_chartostring(self, True_or_False)

    Call Variable.set_auto_chartostring for all variables contained in this Dataset or @@ -2435,7 +2442,7 @@

    In-memory (diskless) Datasets

    set_auto_maskandscale(unknown):
    - +

    set_auto_maskandscale(self, True_or_False)

    Call Variable.set_auto_maskandscale for all variables contained in this Dataset or @@ -2458,7 +2465,7 @@

    In-memory (diskless) Datasets

    set_auto_mask(unknown):
    - +

    set_auto_mask(self, True_or_False)

    Call Variable.set_auto_mask for all variables contained in this Dataset or @@ -2482,7 +2489,7 @@

    In-memory (diskless) Datasets

    set_auto_scale(unknown):
    - +

    set_auto_scale(self, True_or_False)

    Call Variable.set_auto_scale for all variables contained in this Dataset or @@ -2505,7 +2512,7 @@

    In-memory (diskless) Datasets

    set_always_mask(unknown):
    - +

    set_always_mask(self, True_or_False)

    Call Variable.set_always_mask for all variables contained in @@ -2533,7 +2540,7 @@

    In-memory (diskless) Datasets

    set_ncstring_attrs(unknown):
    - +

    set_ncstring_attrs(self, True_or_False)

    Call Variable.set_ncstring_attrs for all variables contained in @@ -2558,7 +2565,7 @@

    In-memory (diskless) Datasets

    get_variables_by_attributes(unknown):
    - +

    get_variables_by_attribute(self, **kwargs)

    Returns a list of variables that match specific conditions.

    @@ -2566,7 +2573,7 @@

    In-memory (diskless) Datasets

    Can pass in key=value parameters and variables are returned that contain all of the matches. For example,

    -
    >>> # Get variables with x-axis attribute.
    +
    >>> # Get variables with x-axis attribute.
     >>> vs = nc.get_variables_by_attributes(axis='X')
     >>> # Get variables with matching "standard_name" attribute
     >>> vs = nc.get_variables_by_attributes(standard_name='northward_sea_water_velocity')
    @@ -2577,7 +2584,7 @@ 

    In-memory (diskless) Datasets

    the attribute value. None is given as the attribute value when the attribute does not exist on the variable. For example,

    -
    >>> # Get Axis variables
    +
    >>> # Get Axis variables
     >>> vs = nc.get_variables_by_attributes(axis=lambda v: v in ['X', 'Y', 'Z', 'T'])
     >>> # Get variables that don't have an "axis" attribute
     >>> vs = nc.get_variables_by_attributes(axis=lambda v: v is None)
    @@ -2596,7 +2603,7 @@ 

    In-memory (diskless) Datasets

    fromcdl(unknown):
    - +

    fromcdl(cdlfilename, ncfilename=None, mode='a',format='NETCDF4')

    call ncgen via subprocess to create Dataset from CDL @@ -2626,7 +2633,7 @@

    In-memory (diskless) Datasets

    tocdl(unknown):
    - +

    tocdl(self, coordvars=False, data=False, outfile=None)

    call ncdump via subprocess to create CDL @@ -2645,9 +2652,10 @@

    In-memory (diskless) Datasets

    #   - name = <attribute 'name' of 'netCDF4._netCDF4.Dataset' objects> + name
    +

    string name of Group instance

    @@ -2656,109 +2664,121 @@

    In-memory (diskless) Datasets

    #   - groups = <attribute 'groups' of 'netCDF4._netCDF4.Dataset' objects> + groups
    +
    #   - dimensions = <attribute 'dimensions' of 'netCDF4._netCDF4.Dataset' objects> + dimensions
    +
    #   - variables = <attribute 'variables' of 'netCDF4._netCDF4.Dataset' objects> + variables
    +
    #   - disk_format = <attribute 'disk_format' of 'netCDF4._netCDF4.Dataset' objects> + disk_format
    +
    #   - path = <attribute 'path' of 'netCDF4._netCDF4.Dataset' objects> + path
    +
    #   - parent = <attribute 'parent' of 'netCDF4._netCDF4.Dataset' objects> + parent
    +
    #   - file_format = <attribute 'file_format' of 'netCDF4._netCDF4.Dataset' objects> + file_format
    +
    #   - data_model = <attribute 'data_model' of 'netCDF4._netCDF4.Dataset' objects> + data_model
    +
    #   - cmptypes = <attribute 'cmptypes' of 'netCDF4._netCDF4.Dataset' objects> + cmptypes
    +
    #   - vltypes = <attribute 'vltypes' of 'netCDF4._netCDF4.Dataset' objects> + vltypes
    +
    #   - enumtypes = <attribute 'enumtypes' of 'netCDF4._netCDF4.Dataset' objects> + enumtypes
    +
    #   - keepweakref = <attribute 'keepweakref' of 'netCDF4._netCDF4.Dataset' objects> + keepweakref
    +
    @@ -2771,7 +2791,7 @@

    In-memory (diskless) Datasets

    Variable:
    - +

    A netCDF Variable is used to read and write netCDF data. They are analogous to numpy array objects. See Variable.__init__ for more details.

    @@ -2855,7 +2875,7 @@

    In-memory (diskless) Datasets

    Variable()
    - +

    __init__(self, group, name, datatype, dimensions=(), compression=None, zlib=False, complevel=4, shuffle=True, szip_coding='nn', szip_pixels_per_block=8, blosc_shuffle=1, fletcher32=False, contiguous=False, @@ -2976,7 +2996,7 @@

    In-memory (diskless) Datasets

    value that the variable gets filled with before any data is written to it) is replaced with this value. If fill_value is set to False, then the variable is not pre-filled. The default netCDF fill values can be found -in the dictionary netCDF4.default_fillvals.

    +in the dictionary netCDF4.default_fillvals.

    chunk_cache: If specified, sets the chunk cache size for this variable. Persists as long as Dataset is open. Use set_var_chunk_cache to @@ -2997,7 +3017,7 @@

    In-memory (diskless) Datasets

    group(unknown):
    - +

    group(self)

    return the group that this Variable is a member of.

    @@ -3013,7 +3033,7 @@

    In-memory (diskless) Datasets

    ncattrs(unknown):
    - +

    ncattrs(self)

    return netCDF attribute names for this Variable in a list.

    @@ -3029,7 +3049,7 @@

    In-memory (diskless) Datasets

    setncattr(unknown):
    - +

    setncattr(self,name,value)

    set a netCDF variable attribute using name,value pair. Use if you need to set a @@ -3047,7 +3067,7 @@

    In-memory (diskless) Datasets

    setncattr_string(unknown):
    - +

    setncattr_string(self,name,value)

    set a netCDF variable string attribute using name,value pair. @@ -3066,7 +3086,7 @@

    In-memory (diskless) Datasets

    setncatts(unknown):
    - +

    setncatts(self,attdict)

    set a bunch of netCDF variable attributes at once using a python dictionary. @@ -3085,7 +3105,7 @@

    In-memory (diskless) Datasets

    getncattr(unknown):
    - +

    getncattr(self,name)

    retrieve a netCDF variable attribute. Use if you need to set a @@ -3106,7 +3126,7 @@

    In-memory (diskless) Datasets

    delncattr(unknown):
    - +

    delncattr(self,name,value)

    delete a netCDF variable attribute. Use if you need to delete a @@ -3124,7 +3144,7 @@

    In-memory (diskless) Datasets

    filters(unknown):
    - +

    filters(self)

    return dictionary containing HDF5 filter parameters.

    @@ -3140,7 +3160,7 @@

    In-memory (diskless) Datasets

    quantization(unknown):
    - +

    quantization(self)

    return number of significant digits and the algorithm used in quantization. @@ -3157,7 +3177,7 @@

    In-memory (diskless) Datasets

    endian(unknown):
    - +

    endian(self)

    return endian-ness (little,big,native) of variable (as stored in HDF5 file).

    @@ -3173,7 +3193,7 @@

    In-memory (diskless) Datasets

    chunking(unknown):
    - +

    chunking(self)

    return variable chunking information. If the dataset is @@ -3192,7 +3212,7 @@

    In-memory (diskless) Datasets

    get_var_chunk_cache(unknown):
    - +

    get_var_chunk_cache(self)

    return variable chunk cache information in a tuple (size,nelems,preemption). @@ -3210,7 +3230,7 @@

    In-memory (diskless) Datasets

    set_var_chunk_cache(unknown):
    - +

    set_var_chunk_cache(self,size=None,nelems=None,preemption=None)

    change variable chunk cache settings. @@ -3228,7 +3248,7 @@

    In-memory (diskless) Datasets

    renameAttribute(unknown):
    - +

    renameAttribute(self, oldname, newname)

    rename a Variable attribute named oldname to newname.

    @@ -3244,7 +3264,7 @@

    In-memory (diskless) Datasets

    assignValue(unknown):
    - +

    assignValue(self, val)

    assign a value to a scalar variable. Provided for compatibility with @@ -3261,7 +3281,7 @@

    In-memory (diskless) Datasets

    getValue(unknown):
    - +

    getValue(self)

    get the value of a scalar variable. Provided for compatibility with @@ -3278,7 +3298,7 @@

    In-memory (diskless) Datasets

    set_auto_chartostring(unknown):
    - +

    set_auto_chartostring(self,chartostring)

    turn on or off automatic conversion of character variable data to and @@ -3309,7 +3329,7 @@

    In-memory (diskless) Datasets

    use_nc_get_vars(unknown):
    - +

    use_nc_get_vars(self,_use_get_vars)

    enable the use of netcdf library routine nc_get_vars @@ -3329,7 +3349,7 @@

    In-memory (diskless) Datasets

    set_auto_maskandscale(unknown):
    - +

    set_auto_maskandscale(self,maskandscale)

    turn on or off automatic conversion of variable data to and @@ -3393,7 +3413,7 @@

    In-memory (diskless) Datasets

    set_auto_scale(unknown):
    - +

    set_auto_scale(self,scale)

    turn on or off automatic packing/unpacking of variable @@ -3442,7 +3462,7 @@

    In-memory (diskless) Datasets

    set_auto_mask(unknown):
    - +

    set_auto_mask(self,mask)

    turn on or off automatic conversion of variable data to and @@ -3477,7 +3497,7 @@

    In-memory (diskless) Datasets

    set_always_mask(unknown):
    - +

    set_always_mask(self,always_mask)

    turn on or off conversion of data without missing values to regular @@ -3500,7 +3520,7 @@

    In-memory (diskless) Datasets

    set_ncstring_attrs(unknown):
    - +

    set_always_mask(self,ncstring_attrs)

    turn on or off creating NC_STRING string attributes.

    @@ -3522,7 +3542,7 @@

    In-memory (diskless) Datasets

    set_collective(unknown):
    - +

    set_collective(self,True_or_False)

    turn on or off collective parallel IO access. Ignored if file is not @@ -3539,7 +3559,7 @@

    In-memory (diskless) Datasets

    get_dims(unknown):
    - +

    get_dims(self)

    return a tuple of Dimension instances associated with this @@ -3551,9 +3571,10 @@

    In-memory (diskless) Datasets

    #   - name = <attribute 'name' of 'netCDF4._netCDF4.Variable' objects> + name
    +

    string name of Variable instance

    @@ -3562,9 +3583,10 @@

    In-memory (diskless) Datasets

    #   - datatype = <attribute 'datatype' of 'netCDF4._netCDF4.Variable' objects> + datatype
    +

    numpy data type (for primitive data types) or VLType/CompoundType/EnumType instance (for compound, vlen or enum data types)

    @@ -3575,9 +3597,10 @@

    In-memory (diskless) Datasets

    #   - shape = <attribute 'shape' of 'netCDF4._netCDF4.Variable' objects> + shape
    +

    find current sizes of all variable dimensions

    @@ -3586,9 +3609,10 @@

    In-memory (diskless) Datasets

    #   - size = <attribute 'size' of 'netCDF4._netCDF4.Variable' objects> + size
    +

    Return the number of stored elements.

    @@ -3597,9 +3621,10 @@

    In-memory (diskless) Datasets

    #   - dimensions = <attribute 'dimensions' of 'netCDF4._netCDF4.Variable' objects> + dimensions
    +

    get variables's dimension names

    @@ -3608,55 +3633,61 @@

    In-memory (diskless) Datasets

    #   - ndim = <attribute 'ndim' of 'netCDF4._netCDF4.Variable' objects> + ndim
    +
    #   - dtype = <attribute 'dtype' of 'netCDF4._netCDF4.Variable' objects> + dtype
    +
    #   - mask = <attribute 'mask' of 'netCDF4._netCDF4.Variable' objects> + mask
    +
    #   - scale = <attribute 'scale' of 'netCDF4._netCDF4.Variable' objects> + scale
    +
    #   - always_mask = <attribute 'always_mask' of 'netCDF4._netCDF4.Variable' objects> + always_mask
    +
    #   - chartostring = <attribute 'chartostring' of 'netCDF4._netCDF4.Variable' objects> + chartostring
    +
    @@ -3669,7 +3700,7 @@

    In-memory (diskless) Datasets

    Dimension:
    - +

    A netCDF Dimension is used to describe the coordinates of a Variable. See Dimension.__init__ for more details.

    @@ -3695,7 +3726,7 @@

    In-memory (diskless) Datasets

    Dimension()
    - +

    __init__(self, group, name, size=None)

    Dimension constructor.

    @@ -3721,7 +3752,7 @@

    In-memory (diskless) Datasets

    group(unknown):
    - +

    group(self)

    return the group that this Dimension is a member of.

    @@ -3737,7 +3768,7 @@

    In-memory (diskless) Datasets

    isunlimited(unknown):
    - +

    isunlimited(self)

    returns True if the Dimension instance is unlimited, False otherwise.

    @@ -3748,9 +3779,10 @@

    In-memory (diskless) Datasets

    #   - name = <attribute 'name' of 'netCDF4._netCDF4.Dimension' objects> + name
    +

    string name of Dimension instance

    @@ -3759,9 +3791,10 @@

    In-memory (diskless) Datasets

    #   - size = <attribute 'size' of 'netCDF4._netCDF4.Dimension' objects> + size
    +

    current size of Dimension (calls len on Dimension instance)

    @@ -3777,7 +3810,7 @@

    In-memory (diskless) Datasets

    Group(netCDF4.Dataset):
    - +

    Groups define a hierarchical namespace within a netCDF file. They are analogous to directories in a unix filesystem. Each Group behaves like a Dataset within a Dataset, and can contain it's own variables, @@ -3801,7 +3834,7 @@

    In-memory (diskless) Datasets

    Group()
    - +

    __init__(self, parent, name) Group constructor.

    @@ -3825,7 +3858,7 @@

    In-memory (diskless) Datasets

    close(unknown):
    - +

    close(self)

    overrides Dataset close method which does not apply to Group @@ -3895,7 +3928,7 @@

    Inherited Members
    MFDataset(netCDF4.Dataset):
    - +

    Class for reading multi-file netCDF Datasets, making variables spanning multiple files appear as if they were in one file. Datasets must be in NETCDF4_CLASSIC, NETCDF3_CLASSIC, NETCDF3_64BIT_OFFSET @@ -3905,7 +3938,7 @@

    Inherited Members

    Example usage (See MFDataset.__init__ for more details):

    -
    >>> import numpy as np
    +
    >>> import numpy as np
     >>> # create a series of netCDF files with a variable sharing
     >>> # the same unlimited dimension.
     >>> for nf in range(10):
    @@ -3932,7 +3965,7 @@ 
    Inherited Members
    MFDataset(files, check=False, aggdim=None, exclude=[], master_file=None)
    - +

    __init__(self, files, check=False, aggdim=None, exclude=[], master_file=None)

    @@ -3977,7 +4010,7 @@
    Inherited Members
    ncattrs(self):
    - +

    ncattrs(self)

    return the netcdf attribute names from the master file.

    @@ -3993,7 +4026,7 @@
    Inherited Members
    close(self):
    - +

    close(self)

    close all the open files.

    @@ -4061,13 +4094,13 @@
    Inherited Members
    MFTime(netCDF4._netCDF4._Variable):
    - +

    Class providing an interface to a MFDataset time Variable by imposing a unique common time unit and/or calendar to all files.

    Example usage (See MFTime.__init__ for more details):

    -
    >>> import numpy as np
    +
    >>> import numpy as np
     >>> f1 = Dataset("mftest_1.nc","w", format="NETCDF4_CLASSIC")
     >>> f2 = Dataset("mftest_2.nc","w", format="NETCDF4_CLASSIC")
     >>> f1.createDimension("time",None)
    @@ -4103,7 +4136,7 @@ 
    Inherited Members
    MFTime(time, units=None, calendar=None)
    - +

    __init__(self, time, units=None, calendar=None)

    Create a time Variable with units consistent across a multifile @@ -4147,7 +4180,7 @@

    Inherited Members
    CompoundType:
    - +

    A CompoundType instance is used to describe a compound data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. @@ -4166,7 +4199,7 @@

    Inherited Members
    CompoundType()
    - +

    __init__(group, datatype, datatype_name)

    CompoundType constructor.

    @@ -4195,28 +4228,31 @@
    Inherited Members
    #   - dtype = <attribute 'dtype' of 'netCDF4._netCDF4.CompoundType' objects> + dtype
    +
    #   - dtype_view = <attribute 'dtype_view' of 'netCDF4._netCDF4.CompoundType' objects> + dtype_view
    +
    #   - name = <attribute 'name' of 'netCDF4._netCDF4.CompoundType' objects> + name
    +
    @@ -4229,7 +4265,7 @@
    Inherited Members
    VLType:
    - +

    A VLType instance is used to describe a variable length (VLEN) data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. See @@ -4247,7 +4283,7 @@

    Inherited Members
    VLType()
    - +

    __init__(group, datatype, datatype_name)

    VLType constructor.

    @@ -4270,19 +4306,21 @@
    Inherited Members
    #   - dtype = <attribute 'dtype' of 'netCDF4._netCDF4.VLType' objects> + dtype
    +
    #   - name = <attribute 'name' of 'netCDF4._netCDF4.VLType' objects> + name
    +
    @@ -4294,7 +4332,7 @@
    Inherited Members
    date2num(unknown):
    - +

    date2num(dates, units, calendar=None, has_year_zero=None)

    Return numeric time values given datetime objects. The units @@ -4354,7 +4392,7 @@

    Inherited Members
    num2date(unknown):
    - +

    num2date(times, units, calendar=u'standard', only_use_cftime_datetimes=True, only_use_python_datetimes=False, has_year_zero=None)

    Return datetime objects given numeric time values. The units @@ -4426,7 +4464,7 @@

    Inherited Members
    date2index(unknown):
    - +

    date2index(dates, nctime, calendar=None, select=u'exact', has_year_zero=None)

    Return indices of a netCDF time variable corresponding to the given dates.

    @@ -4480,7 +4518,7 @@
    Inherited Members
    stringtochar(unknown):
    - +

    stringtochar(a,encoding='utf-8')

    convert a string array to a character array with one extra dimension

    @@ -4507,7 +4545,7 @@
    Inherited Members
    chartostring(unknown):
    - +

    chartostring(b,encoding='utf-8')

    convert a character array to a string array with one less dimension.

    @@ -4534,7 +4572,7 @@
    Inherited Members
    stringtoarr(unknown):
    - +

    stringtoarr(a, NUMCHARS,dtype='S')

    convert a string to a character array of length NUMCHARS

    @@ -4562,7 +4600,7 @@
    Inherited Members
    getlibversion(unknown):
    - +

    getlibversion()

    returns a string describing the version of the netcdf library @@ -4580,7 +4618,7 @@

    Inherited Members
    EnumType:
    - +

    A EnumType instance is used to describe an Enum data type, and can be passed to the the Dataset.createVariable method of a Dataset or Group instance. See @@ -4598,7 +4636,7 @@

    Inherited Members
    EnumType()
    - +

    __init__(group, datatype, datatype_name, enum_dict)

    EnumType constructor.

    @@ -4624,28 +4662,31 @@
    Inherited Members
    #   - dtype = <attribute 'dtype' of 'netCDF4._netCDF4.EnumType' objects> + dtype
    +
    #   - name = <attribute 'name' of 'netCDF4._netCDF4.EnumType' objects> + name
    +
    #   - enum_dict = <attribute 'enum_dict' of 'netCDF4._netCDF4.EnumType' objects> + enum_dict
    +
    @@ -4657,7 +4698,7 @@
    Inherited Members
    get_chunk_cache(unknown):
    - +

    get_chunk_cache()

    return current netCDF chunk cache information in a tuple (size,nelems,preemption). @@ -4675,7 +4716,7 @@

    Inherited Members
    set_chunk_cache(unknown):
    - +

    set_chunk_cache(self,size=None,nelems=None,preemption=None)

    change netCDF4 chunk cache settings. diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index b9897aa34..d1d3e824a 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -58,6 +58,12 @@ types) are not supported. If the dependencies are not found in any of the paths specified by environment variables, then standard locations (such as `/usr` and `/usr/local`) are searched. + - if the env var `NETCDF_PLUGIN_DIR` is set to point to the location netcdf-c compression + plugin shared objects, they will be installed inside the package. In this + case `HDF5_PLUGIN_PATH` will be set to the package installation path on import, + so the extra compression algorithms available in netcdf-c 4.9.0 will automatically + be available. Otherwise, the user will have to set `HDF5_PLUGIN_PATH` explicitly + to have access to the extra compression plugins. - run `python setup.py build`, then `python setup.py install` (as root if necessary). - run the tests in the 'test' directory by running `python run_all.py`. From 073956ec5748f44b869d34890082c325c9653998 Mon Sep 17 00:00:00 2001 From: jswhit Date: Wed, 18 May 2022 10:07:26 -0600 Subject: [PATCH 72/92] fix typo --- test/run_all.py | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/test/run_all.py b/test/run_all.py index a76764d97..289b75aae 100755 --- a/test/run_all.py +++ b/test/run_all.py @@ -28,13 +28,13 @@ sys.stdout.write('not running tst_compression_quant.py ...\n') if not __has_zstandard_support__: test_files.remove('tst_compression_zstd.py') - sys.stdout.write('not running tst_compression_quant.py ...\n') + sys.stdout.write('not running tst_compression_zstd.py ...\n') if not __has_bzip2_support__: test_files.remove('tst_compression_bzip2.py') sys.stdout.write('not running tst_compression_bzip2.py ...\n') if not __has_blosc_support__: test_files.remove('tst_compression_blosc.py') - sys.stdout.write('not running tst_compression_bzip2.py ...\n') + sys.stdout.write('not running tst_compression_blosc2.py ...\n') if not __has_szip_support__: test_files.remove('tst_compression_szip.py') sys.stdout.write('not running tst_compression_szip.py ...\n') From a5ae6791f51ada9c95cde686c883accdc51ec814 Mon Sep 17 00:00:00 2001 From: jswhit Date: Fri, 20 May 2022 10:54:05 -0600 Subject: [PATCH 73/92] update for latest changes in plugin installation --- setup.py | 2 +- src/netCDF4/__init__.py | 2 +- src/netCDF4/_netCDF4.pyx | 6 +++--- 3 files changed, 5 insertions(+), 5 deletions(-) diff --git a/setup.py b/setup.py index 81ab37576..76a1d9fd6 100644 --- a/setup.py +++ b/setup.py @@ -686,7 +686,7 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): # (should point to location of .so files built by netcdf-c) if os.environ.get("NETCDF_PLUGIN_DIR"): plugin_dir = os.environ.get("NETCDF_PLUGIN_DIR") - plugins = glob.glob(os.path.join(plugin_dir, "*.so")) + plugins = glob.glob(os.path.join(plugin_dir, "lib__nc*")) if not plugins: sys.stdout.write('no .so files in NETCDF_PLUGIN_DIR, no plugin shared objects installed\n') data_files = [] diff --git a/src/netCDF4/__init__.py b/src/netCDF4/__init__.py index b6f9d2293..4b880cd92 100644 --- a/src/netCDF4/__init__.py +++ b/src/netCDF4/__init__.py @@ -14,5 +14,5 @@ __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] # if HDF5_PLUGIN_PATH not set, point to package path if libh5noop.so exists there -if 'HDF5_PLUGIN_PATH' not in os.environ and os.path.exists(os.path.join(__path__[0],'libh5noop.so')): +if 'HDF5_PLUGIN_PATH' not in os.environ and os.path.exists(os.path.join(__path__[0],'lib__nczhdf5filters.so')): os.environ['HDF5_PLUGIN_PATH']=__path__[0] diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index d1d3e824a..001fcd122 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -58,10 +58,10 @@ types) are not supported. If the dependencies are not found in any of the paths specified by environment variables, then standard locations (such as `/usr` and `/usr/local`) are searched. - - if the env var `NETCDF_PLUGIN_DIR` is set to point to the location netcdf-c compression - plugin shared objects, they will be installed inside the package. In this + - if the env var `NETCDF_PLUGIN_DIR` is set to point to the location of the netcdf-c compression + plugin shared objects built by netcdf >= 4.9.0, they will be installed inside the package. In this case `HDF5_PLUGIN_PATH` will be set to the package installation path on import, - so the extra compression algorithms available in netcdf-c 4.9.0 will automatically + so the extra compression algorithms available in netcdf-c >= 4.9.0 will automatically be available. Otherwise, the user will have to set `HDF5_PLUGIN_PATH` explicitly to have access to the extra compression plugins. - run `python setup.py build`, then `python setup.py install` (as root if From 4e837431c29b312e84211f84a365a3518235777a Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sat, 21 May 2022 09:17:39 -0600 Subject: [PATCH 74/92] update docs --- Changelog | 2 +- README.md | 2 +- setup.py | 10 +++++----- src/netCDF4/_netCDF4.pyx | 2 +- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/Changelog b/Changelog index 435a6bdc6..4caef91eb 100644 --- a/Changelog +++ b/Changelog @@ -18,7 +18,7 @@ 'szip_mask' and 'szip_pixels_per_block' kwargs also added. compression='zlib' is equivalent to (the now deprecated) zlib=True. If the environment variable NETCDF_PLUGIN_DIR is set to point to the - directory with the HDF5 plugin .so files, then the compression plugins will + directory with the compression plugin lib__nc* files, then the compression plugins will be installed within the package and be automatically available (the binary wheels have this). Otherwise, the environment variable HDF5_PLUGIN_PATH needs to be se to point to plugins in order to use the new compression diff --git a/README.md b/README.md index 453648b80..9f714903f 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ For details on the latest updates, see the [Changelog](https://github.com/Unidat for quantization (bit-grooming and bit-rounding) functionality in netcdf-c 4.9.0 which can dramatically improve compression. Dataset.createVariable now accepts dimension instances (instead of just dimension names). 'compression' kwarg added to Dataset.createVariable to support szip as -well as new compression algorithms available in netcdf-c 4.9.0 through HDF5 filter plugsins (such +well as new compression algorithms available in netcdf-c 4.9.0 through compression plugins (such as zstd, bzip2 and blosc). Working arm64 wheels for Apple M1 Silicon now available on pypi. 10/31/2021: Version [1.5.8](https://pypi.python.org/pypi/netCDF4/1.5.8) released. Fix Enum bug, add binary wheels for aarch64 and python 3.10. diff --git a/setup.py b/setup.py index 76a1d9fd6..5c61f4a3a 100644 --- a/setup.py +++ b/setup.py @@ -682,21 +682,21 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): else: ext_modules = None -# if NETCDF_PLUGIN_DIR set, install netcdf-c plugin shared objects in package -# (should point to location of .so files built by netcdf-c) +# if NETCDF_PLUGIN_DIR set, install netcdf-c compression plugins inside package +# (should point to location of lib__nc* files built by netcdf-c) if os.environ.get("NETCDF_PLUGIN_DIR"): plugin_dir = os.environ.get("NETCDF_PLUGIN_DIR") plugins = glob.glob(os.path.join(plugin_dir, "lib__nc*")) if not plugins: - sys.stdout.write('no .so files in NETCDF_PLUGIN_DIR, no plugin shared objects installed\n') + sys.stdout.write('no plugin files in NETCDF_PLUGIN_DIR, not installing..\n') data_files = [] else: data_files = plugins - sys.stdout.write('installing plugin shared objects from %s ...\n' % plugin_dir) + sys.stdout.write('installing netcdf compression plugins from %s ...\n' % plugin_dir) sofiles = [os.path.basename(sofilepath) for sofilepath in data_files] sys.stdout.write(repr(sofiles)+'\n') else: - sys.stdout.write('NETCDF_PLUGIN_DIR not set, no plugin shared objects installed\n') + sys.stdout.write('NETCDF_PLUGIN_DIR not set, no netcdf compression plugins installed\n') data_files = [] setup(name="netCDF4", diff --git a/src/netCDF4/_netCDF4.pyx b/src/netCDF4/_netCDF4.pyx index 001fcd122..8a006f457 100644 --- a/src/netCDF4/_netCDF4.pyx +++ b/src/netCDF4/_netCDF4.pyx @@ -59,7 +59,7 @@ types) are not supported. in any of the paths specified by environment variables, then standard locations (such as `/usr` and `/usr/local`) are searched. - if the env var `NETCDF_PLUGIN_DIR` is set to point to the location of the netcdf-c compression - plugin shared objects built by netcdf >= 4.9.0, they will be installed inside the package. In this + plugins built by netcdf >= 4.9.0, they will be installed inside the package. In this case `HDF5_PLUGIN_PATH` will be set to the package installation path on import, so the extra compression algorithms available in netcdf-c >= 4.9.0 will automatically be available. Otherwise, the user will have to set `HDF5_PLUGIN_PATH` explicitly From d5af6ebdd19b4fa1840edb9a06ebf42989d10433 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Fri, 17 Jun 2022 17:21:33 -0600 Subject: [PATCH 75/92] update netcdf-c version --- .github/workflows/build.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 496012eda..d3cafded6 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -6,7 +6,7 @@ jobs: runs-on: ubuntu-latest env: PNETCDF_VERSION: 1.12.1 - NETCDF_VERSION: 4.8.0 + NETCDF_VERSION: 4.9.0 NETCDF_DIR: ${{ github.workspace }}/.. NETCDF_EXTRA_CONFIG: --enable-pnetcdf CC: mpicc.mpich From 4d3a51062ee18e36a9740a88db752ce7f7788d39 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Fri, 17 Jun 2022 17:31:46 -0600 Subject: [PATCH 76/92] update netcdf-c download URL --- .github/workflows/build.yml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index d3cafded6..01861228f 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -36,7 +36,7 @@ jobs: make install popd echo "Download and build netCDF version ${NETCDF_VERSION}" - wget ftp://ftp.unidata.ucar.edu/pub/netcdf/netcdf-c-${NETCDF_VERSION}.tar.gz + wget https://downloads.unidata.ucar.edu/netcdf-c/4.9.0/netcdf-c-${NETCDF_VERSION}.tar.gz tar -xzf netcdf-c-${NETCDF_VERSION}.tar.gz pushd netcdf-c-${NETCDF_VERSION} export CPPFLAGS="-I/usr/include/hdf5/mpich -I${NETCDF_DIR}/include" @@ -61,6 +61,7 @@ jobs: - name: Install netcdf4-python run: | export PATH=${NETCDF_DIR}/bin:${PATH} + export NETCDF_PLUGIN_DIR=${NETCDF_DIR}/plugins/.libs python setup.py install - name: Test run: | From 00a0acb5545032fa867861d8e8732208b7a8cdb3 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Fri, 17 Jun 2022 17:37:46 -0600 Subject: [PATCH 77/92] build plugins --- .github/workflows/build.yml | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 01861228f..4dff81c0d 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -42,9 +42,10 @@ jobs: export CPPFLAGS="-I/usr/include/hdf5/mpich -I${NETCDF_DIR}/include" export LDFLAGS="-L${NETCDF_DIR}/lib" export LIBS="-lhdf5_mpich_hl -lhdf5_mpich -lm -lz" - ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 $NETCDF_EXTRA_CONFIG + ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 --with-plugin-dir $NETCDF_EXTRA_CONFIG make -j 2 make install + ls -l plugins/.libs popd # - name: The job has failed From 4377e63081077caf454ffd84f4c696dc60297905 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Fri, 17 Jun 2022 17:44:19 -0600 Subject: [PATCH 78/92] update plugin dir --- .github/workflows/build.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 4dff81c0d..08d00132c 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -45,7 +45,7 @@ jobs: ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 --with-plugin-dir $NETCDF_EXTRA_CONFIG make -j 2 make install - ls -l plugins/.libs + ls -l plugins/plugindir popd # - name: The job has failed @@ -62,7 +62,7 @@ jobs: - name: Install netcdf4-python run: | export PATH=${NETCDF_DIR}/bin:${PATH} - export NETCDF_PLUGIN_DIR=${NETCDF_DIR}/plugins/.libs + export NETCDF_PLUGIN_DIR=${NETCDF_DIR}/plugins/plugindir python setup.py install - name: Test run: | From 4e5ee594a0c0910ae479619e93de11740534297f Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Fri, 17 Jun 2022 17:55:05 -0600 Subject: [PATCH 79/92] update plugindir --- .github/workflows/build_master.yml | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 10e5fa508..1846524f0 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -31,7 +31,7 @@ jobs: export LDFLAGS="-L${NETCDF_DIR}/lib" export LIBS="-lhdf5_mpich_hl -lhdf5_mpich -lm -lz" autoreconf -i - ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 + ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 make -j 2 make install popd @@ -50,12 +50,12 @@ jobs: - name: Install netcdf4-python run: | export PATH=${NETCDF_DIR}/bin:${PATH} - export NETCDF_PLUGIN_DIR=${NETCDF_DIR}/plugins/.libs + export NETCDF_PLUGIN_DIR=${NETCDF_DIR}/plugins/plugindir python setup.py install - name: Test run: | export PATH=${NETCDF_DIR}/bin:${PATH} - #export HDF5_PLUGIN_PATH=${NETCDF_DIR}/plugins/.libs + #export HDF5_PLUGIN_PATH=${NETCDF_DIR}/plugins/plugindir python checkversion.py # serial cd test From f6e00cd982ad90cc340a360bdcefd3b4a150f7c6 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Fri, 17 Jun 2022 18:17:22 -0600 Subject: [PATCH 80/92] update --- .github/workflows/build_master.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 1846524f0..bf0ce7a6e 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -51,6 +51,8 @@ jobs: run: | export PATH=${NETCDF_DIR}/bin:${PATH} export NETCDF_PLUGIN_DIR=${NETCDF_DIR}/plugins/plugindir + ls -l /home/runner/work/netcdf4-python/netcdf4-python/netcdf-c/plugins/plugindir + ls -l $NETCDF_PLUGIN_DIR python setup.py install - name: Test run: | From 6a22864bdfbf486ff5962c1d5d6728518dd9f04a Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Fri, 17 Jun 2022 18:54:47 -0600 Subject: [PATCH 81/92] update plugindir --- .github/workflows/build_master.yml | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index bf0ce7a6e..3f4d7c2b6 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -50,8 +50,7 @@ jobs: - name: Install netcdf4-python run: | export PATH=${NETCDF_DIR}/bin:${PATH} - export NETCDF_PLUGIN_DIR=${NETCDF_DIR}/plugins/plugindir - ls -l /home/runner/work/netcdf4-python/netcdf4-python/netcdf-c/plugins/plugindir + export NETCDF_PLUGIN_DIR=${{ github.workspace }}/netcdf-c/plugins/plugindir ls -l $NETCDF_PLUGIN_DIR python setup.py install - name: Test From 7a8c94c8a7a924972e32a4e27262d622bc91a7b5 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Fri, 17 Jun 2022 19:23:20 -0600 Subject: [PATCH 82/92] fix plugin installation for 4.9.0 --- .github/workflows/build.yml | 5 ++--- .github/workflows/build_master.yml | 1 - 2 files changed, 2 insertions(+), 4 deletions(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index 08d00132c..ad6a011eb 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -42,10 +42,9 @@ jobs: export CPPFLAGS="-I/usr/include/hdf5/mpich -I${NETCDF_DIR}/include" export LDFLAGS="-L${NETCDF_DIR}/lib" export LIBS="-lhdf5_mpich_hl -lhdf5_mpich -lm -lz" - ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 --with-plugin-dir $NETCDF_EXTRA_CONFIG + ./configure --prefix $NETCDF_DIR --enable-netcdf-4 --enable-shared --enable-dap --enable-parallel4 $NETCDF_EXTRA_CONFIG make -j 2 make install - ls -l plugins/plugindir popd # - name: The job has failed @@ -62,7 +61,7 @@ jobs: - name: Install netcdf4-python run: | export PATH=${NETCDF_DIR}/bin:${PATH} - export NETCDF_PLUGIN_DIR=${NETCDF_DIR}/plugins/plugindir + export NETCDF_PLUGIN_DIR=${{ github.workspace }}/netcdf-c/plugins/plugindir python setup.py install - name: Test run: | diff --git a/.github/workflows/build_master.yml b/.github/workflows/build_master.yml index 3f4d7c2b6..00b3a52d4 100644 --- a/.github/workflows/build_master.yml +++ b/.github/workflows/build_master.yml @@ -51,7 +51,6 @@ jobs: run: | export PATH=${NETCDF_DIR}/bin:${PATH} export NETCDF_PLUGIN_DIR=${{ github.workspace }}/netcdf-c/plugins/plugindir - ls -l $NETCDF_PLUGIN_DIR python setup.py install - name: Test run: | From b80f43d0a6755a1298897e3f2a7f24eb30261a98 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Fri, 17 Jun 2022 19:35:39 -0600 Subject: [PATCH 83/92] update --- .github/workflows/build.yml | 4 ++-- Changelog | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml index ad6a011eb..11e218157 100644 --- a/.github/workflows/build.yml +++ b/.github/workflows/build.yml @@ -26,7 +26,7 @@ jobs: - name: Install Ubuntu Dependencies run: | sudo apt-get update - sudo apt-get install mpich libmpich-dev libhdf5-mpich-dev libcurl4-openssl-dev + sudo apt-get install mpich libmpich-dev libhdf5-mpich-dev libcurl4-openssl-dev bzip2 libsnappy-dev libblosc-dev libzstd-dev echo "Download and build PnetCDF version ${PNETCDF_VERSION}" wget https://parallel-netcdf.github.io/Release/pnetcdf-${PNETCDF_VERSION}.tar.gz tar -xzf pnetcdf-${PNETCDF_VERSION}.tar.gz @@ -61,7 +61,7 @@ jobs: - name: Install netcdf4-python run: | export PATH=${NETCDF_DIR}/bin:${PATH} - export NETCDF_PLUGIN_DIR=${{ github.workspace }}/netcdf-c/plugins/plugindir + export NETCDF_PLUGIN_DIR=${{ github.workspace }}/netcdf-c-${NETCDF_VERSION}/plugins/plugindir python setup.py install - name: Test run: | diff --git a/Changelog b/Changelog index 4caef91eb..08d7e2cfd 100644 --- a/Changelog +++ b/Changelog @@ -21,7 +21,7 @@ directory with the compression plugin lib__nc* files, then the compression plugins will be installed within the package and be automatically available (the binary wheels have this). Otherwise, the environment variable HDF5_PLUGIN_PATH - needs to be se to point to plugins in order to use the new compression + needs to be set at runtime to point to plugins in order to use the new compression options. * MFDataset did not aggregate 'name' variable attribute (issue #1153). * issue warning instead of raising an exception if missing_value or From a95f1359dbdb98ce9923c84787ed915d946427c1 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Sun, 19 Jun 2022 18:10:43 -0600 Subject: [PATCH 84/92] update --- src/netCDF4/__init__.py | 1 + 1 file changed, 1 insertion(+) diff --git a/src/netCDF4/__init__.py b/src/netCDF4/__init__.py index 4b880cd92..2f3463fc0 100644 --- a/src/netCDF4/__init__.py +++ b/src/netCDF4/__init__.py @@ -15,4 +15,5 @@ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] # if HDF5_PLUGIN_PATH not set, point to package path if libh5noop.so exists there if 'HDF5_PLUGIN_PATH' not in os.environ and os.path.exists(os.path.join(__path__[0],'lib__nczhdf5filters.so')): + print('setting HDF5_PLUGIN_PATH to %s' % __path__[0]) os.environ['HDF5_PLUGIN_PATH']=__path__[0] From cf202781a1e590334e6547d7b05aa71eab2cee5c Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Mon, 20 Jun 2022 13:16:25 -0600 Subject: [PATCH 85/92] check for dylib extension on plugins --- src/netCDF4/__init__.py | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/netCDF4/__init__.py b/src/netCDF4/__init__.py index 2f3463fc0..d0009398d 100644 --- a/src/netCDF4/__init__.py +++ b/src/netCDF4/__init__.py @@ -13,7 +13,8 @@ import os __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] -# if HDF5_PLUGIN_PATH not set, point to package path if libh5noop.so exists there -if 'HDF5_PLUGIN_PATH' not in os.environ and os.path.exists(os.path.join(__path__[0],'lib__nczhdf5filters.so')): - print('setting HDF5_PLUGIN_PATH to %s' % __path__[0]) +# if HDF5_PLUGIN_PATH not set, point to package path if plugins live there +if 'HDF5_PLUGIN_PATH' not in os.environ and\ + (os.path.exists(os.path.join(__path__[0],'lib__nczhdf5filters.so')) or\ + os.path.exists(os.path.join(__path__[0],'lib__nczhdf5filters.dylib'))): os.environ['HDF5_PLUGIN_PATH']=__path__[0] From 0eed426bed472e7f7893e22de1b1b93a79a613d4 Mon Sep 17 00:00:00 2001 From: Jeff Whitaker Date: Mon, 20 Jun 2022 13:23:08 -0600 Subject: [PATCH 86/92] use package_data to install plugins (works with pip) --- setup.py | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/setup.py b/setup.py index 5c61f4a3a..e9259f2b1 100644 --- a/setup.py +++ b/setup.py @@ -1,5 +1,6 @@ import os, sys, subprocess, glob import os.path as osp +import shutil import configparser from setuptools import setup, Extension from distutils.dist import Distribution @@ -695,6 +696,8 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): sys.stdout.write('installing netcdf compression plugins from %s ...\n' % plugin_dir) sofiles = [os.path.basename(sofilepath) for sofilepath in data_files] sys.stdout.write(repr(sofiles)+'\n') + for f in data_files: + shutil.copy(f, osp.join(os.getcwd(),osp.join('src','netCDF4'))) else: sys.stdout.write('NETCDF_PLUGIN_DIR not set, no netcdf compression plugins installed\n') data_files = [] @@ -724,6 +727,8 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): "Operating System :: OS Independent"], packages=['netCDF4'], package_dir={'':'src'}, - data_files=[('netCDF4',data_files)], + #data_files=[('netCDF4',data_files)], # doesn't work with pip install + include_package_data = True, + package_data={"netCDF4": ["lib__nc*"]}, ext_modules=ext_modules, **setuptools_extra_kwargs) From c476a39b8395000e3dbe2f0d3a1cfae8e914ef83 Mon Sep 17 00:00:00 2001 From: jswhit Date: Mon, 20 Jun 2022 20:24:19 -0600 Subject: [PATCH 87/92] remove plugins copied from outside source tree --- setup.py | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/setup.py b/setup.py index e9259f2b1..745ef9867 100644 --- a/setup.py +++ b/setup.py @@ -685,6 +685,7 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): # if NETCDF_PLUGIN_DIR set, install netcdf-c compression plugins inside package # (should point to location of lib__nc* files built by netcdf-c) +copied_plugins=False if os.environ.get("NETCDF_PLUGIN_DIR"): plugin_dir = os.environ.get("NETCDF_PLUGIN_DIR") plugins = glob.glob(os.path.join(plugin_dir, "lib__nc*")) @@ -698,6 +699,7 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): sys.stdout.write(repr(sofiles)+'\n') for f in data_files: shutil.copy(f, osp.join(os.getcwd(),osp.join('src','netCDF4'))) + copied_plugins=True else: sys.stdout.write('NETCDF_PLUGIN_DIR not set, no netcdf compression plugins installed\n') data_files = [] @@ -732,3 +734,8 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): package_data={"netCDF4": ["lib__nc*"]}, ext_modules=ext_modules, **setuptools_extra_kwargs) + +# remove plugin files copied from outside source tree +if copied_plugins: + for f in sofiles: + os.remove(osp.join(osp.join('src','netCDF4'),f)) From 519eff15622dbbb0501f8ee4b00d2d44ce74b354 Mon Sep 17 00:00:00 2001 From: jswhit Date: Tue, 21 Jun 2022 09:24:03 -0600 Subject: [PATCH 88/92] bypass plugins tests with env var NO_PLUGINS --- test/run_all.py | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/test/run_all.py b/test/run_all.py index 289b75aae..1dfb5309a 100755 --- a/test/run_all.py +++ b/test/run_all.py @@ -26,15 +26,15 @@ if not __has_quantization_support__: test_files.remove('tst_compression_quant.py') sys.stdout.write('not running tst_compression_quant.py ...\n') -if not __has_zstandard_support__: +if not __has_zstandard_support__ or os.getenv('NO_PLUGINS'): test_files.remove('tst_compression_zstd.py') sys.stdout.write('not running tst_compression_zstd.py ...\n') -if not __has_bzip2_support__: +if not __has_bzip2_support__ or os.getenv('NO_PLUGINS'): test_files.remove('tst_compression_bzip2.py') sys.stdout.write('not running tst_compression_bzip2.py ...\n') -if not __has_blosc_support__: +if not __has_blosc_support__ or os.getenv('NO_PLUGINS'): test_files.remove('tst_compression_blosc.py') - sys.stdout.write('not running tst_compression_blosc2.py ...\n') + sys.stdout.write('not running tst_compression_blosc.py ...\n') if not __has_szip_support__: test_files.remove('tst_compression_szip.py') sys.stdout.write('not running tst_compression_szip.py ...\n') From ef100618d851d284547ae73e958e7e28e4f26d90 Mon Sep 17 00:00:00 2001 From: jswhit Date: Thu, 23 Jun 2022 16:00:00 -0600 Subject: [PATCH 89/92] install plugins in 'plugins' subdir --- setup.py | 13 +++++++------ src/netCDF4/__init__.py | 6 +++--- 2 files changed, 10 insertions(+), 9 deletions(-) diff --git a/setup.py b/setup.py index 745ef9867..3f1b593cc 100644 --- a/setup.py +++ b/setup.py @@ -2,7 +2,7 @@ import os.path as osp import shutil import configparser -from setuptools import setup, Extension +from setuptools import setup, Extension, find_namespace_packages from distutils.dist import Distribution setuptools_extra_kwargs = { @@ -698,7 +698,7 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): sofiles = [os.path.basename(sofilepath) for sofilepath in data_files] sys.stdout.write(repr(sofiles)+'\n') for f in data_files: - shutil.copy(f, osp.join(os.getcwd(),osp.join('src','netCDF4'))) + shutil.copy(f, osp.join(os.getcwd(),osp.join(osp.join('src','netCDF4'),'plugins'))) copied_plugins=True else: sys.stdout.write('NETCDF_PLUGIN_DIR not set, no netcdf compression plugins installed\n') @@ -727,15 +727,16 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: System :: Archiving :: Compression", "Operating System :: OS Independent"], - packages=['netCDF4'], + #packages=['netCDF4'], + packages=find_namespace_packages(where="src"), package_dir={'':'src'}, #data_files=[('netCDF4',data_files)], # doesn't work with pip install - include_package_data = True, - package_data={"netCDF4": ["lib__nc*"]}, + #include_package_data = True, + package_data={"netCDF4.plugins": ["lib__nc*"]}, ext_modules=ext_modules, **setuptools_extra_kwargs) # remove plugin files copied from outside source tree if copied_plugins: for f in sofiles: - os.remove(osp.join(osp.join('src','netCDF4'),f)) + os.remove(osp.join(osp.join(osp.join('src','netCDF4'),'plugins'),f)) diff --git a/src/netCDF4/__init__.py b/src/netCDF4/__init__.py index d0009398d..46df12db2 100644 --- a/src/netCDF4/__init__.py +++ b/src/netCDF4/__init__.py @@ -15,6 +15,6 @@ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] # if HDF5_PLUGIN_PATH not set, point to package path if plugins live there if 'HDF5_PLUGIN_PATH' not in os.environ and\ - (os.path.exists(os.path.join(__path__[0],'lib__nczhdf5filters.so')) or\ - os.path.exists(os.path.join(__path__[0],'lib__nczhdf5filters.dylib'))): - os.environ['HDF5_PLUGIN_PATH']=__path__[0] + (os.path.exists(os.path.join(os.path.join(__path__[0],'plugins'),'lib__nczhdf5filters.so')) or\ + os.path.exists(os.path.join(os.path.join(__path__[0],'plugins'),'lib__nczhdf5filters.dylib'))): + os.environ['HDF5_PLUGIN_PATH']=os.path.join(__path__[0],'plugins') From ba05ffc73c4a74d38687ebaacd8ad1ad031080d6 Mon Sep 17 00:00:00 2001 From: jswhit Date: Thu, 23 Jun 2022 16:14:08 -0600 Subject: [PATCH 90/92] add an empty file to plugins dir so it ends up in sdist --- MANIFEST.in | 1 + setup.py | 7 ++++--- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/MANIFEST.in b/MANIFEST.in index c27bb8192..e3497466e 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -13,6 +13,7 @@ include src/netCDF4/__init__.py include src/netCDF4/_netCDF4.pyx exclude src/netCDF4/_netCDF4.c include src/netCDF4/utils.py +include src/netCDF4/plugins/empty.txt include include/netCDF4.pxi include include/mpi-compat.h include include/membuf.pyx diff --git a/setup.py b/setup.py index 3f1b593cc..7853b5c09 100644 --- a/setup.py +++ b/setup.py @@ -697,9 +697,10 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): sys.stdout.write('installing netcdf compression plugins from %s ...\n' % plugin_dir) sofiles = [os.path.basename(sofilepath) for sofilepath in data_files] sys.stdout.write(repr(sofiles)+'\n') - for f in data_files: - shutil.copy(f, osp.join(os.getcwd(),osp.join(osp.join('src','netCDF4'),'plugins'))) - copied_plugins=True + if 'sdist' not in sys.argv[1:] and 'clean' not in sys.argv[1:] and '--version' not in sys.argv[1:]: + for f in data_files: + shutil.copy(f, osp.join(os.getcwd(),osp.join(osp.join('src','netCDF4'),'plugins'))) + copied_plugins=True else: sys.stdout.write('NETCDF_PLUGIN_DIR not set, no netcdf compression plugins installed\n') data_files = [] From e6e7a577779e91752ae793ba2ef26ff2498df9bb Mon Sep 17 00:00:00 2001 From: jswhit Date: Thu, 23 Jun 2022 16:26:49 -0600 Subject: [PATCH 91/92] add empty file to plugins dir --- setup.py | 4 +++- src/netCDF4/plugins/empty.txt | 0 2 files changed, 3 insertions(+), 1 deletion(-) create mode 100644 src/netCDF4/plugins/empty.txt diff --git a/setup.py b/setup.py index 7853b5c09..75d501bfe 100644 --- a/setup.py +++ b/setup.py @@ -740,4 +740,6 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): # remove plugin files copied from outside source tree if copied_plugins: for f in sofiles: - os.remove(osp.join(osp.join(osp.join('src','netCDF4'),'plugins'),f)) + filepath = osp.join(osp.join(osp.join('src','netCDF4'),'plugins'),f) + if os.path.exists(filepath): + os.remove(filepath) diff --git a/src/netCDF4/plugins/empty.txt b/src/netCDF4/plugins/empty.txt new file mode 100644 index 000000000..e69de29bb From 9dc076ea11b2f418dbb47b3fec639b18d6254d4d Mon Sep 17 00:00:00 2001 From: jswhit Date: Thu, 23 Jun 2022 16:34:35 -0600 Subject: [PATCH 92/92] update --- setup.py | 3 --- src/netCDF4/__init__.py | 7 ++++--- 2 files changed, 4 insertions(+), 6 deletions(-) diff --git a/setup.py b/setup.py index 75d501bfe..2eea22e1c 100644 --- a/setup.py +++ b/setup.py @@ -728,11 +728,8 @@ def _populate_hdf5_info(dirstosearch, inc_dirs, libs, lib_dirs): "Topic :: Software Development :: Libraries :: Python Modules", "Topic :: System :: Archiving :: Compression", "Operating System :: OS Independent"], - #packages=['netCDF4'], packages=find_namespace_packages(where="src"), package_dir={'':'src'}, - #data_files=[('netCDF4',data_files)], # doesn't work with pip install - #include_package_data = True, package_data={"netCDF4.plugins": ["lib__nc*"]}, ext_modules=ext_modules, **setuptools_extra_kwargs) diff --git a/src/netCDF4/__init__.py b/src/netCDF4/__init__.py index 46df12db2..e84607bab 100644 --- a/src/netCDF4/__init__.py +++ b/src/netCDF4/__init__.py @@ -14,7 +14,8 @@ __all__ =\ ['Dataset','Variable','Dimension','Group','MFDataset','MFTime','CompoundType','VLType','date2num','num2date','date2index','stringtochar','chartostring','stringtoarr','getlibversion','EnumType','get_chunk_cache','set_chunk_cache'] # if HDF5_PLUGIN_PATH not set, point to package path if plugins live there +pluginpath = os.path.join(__path__[0],'plugins') if 'HDF5_PLUGIN_PATH' not in os.environ and\ - (os.path.exists(os.path.join(os.path.join(__path__[0],'plugins'),'lib__nczhdf5filters.so')) or\ - os.path.exists(os.path.join(os.path.join(__path__[0],'plugins'),'lib__nczhdf5filters.dylib'))): - os.environ['HDF5_PLUGIN_PATH']=os.path.join(__path__[0],'plugins') + (os.path.exists(os.path.join(pluginpath,'lib__nczhdf5filters.so')) or\ + os.path.exists(os.path.join(pluginpath,'lib__nczhdf5filters.dylib'))): + os.environ['HDF5_PLUGIN_PATH']=pluginpath