Default behaviour when no data is provided for a variable #4

lbesnard · 2018-09-27T05:11:11Z

It looks like it is not possible to write a netcdf file without any values attached to a variable.
see
https://github.com/aodn/aodn-netcdf-tools/blob/master/ncwriter/template.py#L410

Also I don't think the keyword "value" should be used, as it is used by default by python dictionary object. Maybe "var_values" could be used or something else.
The issue with using "value" is that
asattr(template.variables["VAR_TO_TEST"]['attributes'], 'values') always returns true even if there aren't any values attached

mhidas · 2018-10-01T06:01:49Z

It looks like it is not possible to write a netcdf file without any values attached to a variable.

Is that something you would want to do?
We could write a variable filled with fill values by default, I guess. However unless there's a good use case requiring that, I think this would indicate the values were not set by mistake, so the code should raise an exception.

Also I don't think the keyword "value" should be used, as it is used by default by python dictionary object.

There is a distinction between the values attribute of the dictionary object (which is a method), e,g,

In[7]: template.variables['TEMP'].values
Out[7]: <bound method OrderedDict.values of OrderedDict([(u'dimensions', [u'TIME', u'DEPTH']), (u'type', u'float32'), (u'attributes', OrderedDict([(u'standard_name', u'sea_water_temperature'), (u'units', u'degC'), (u'valid_min', 0.0), (u'valid_max', 42.0)]))])>

... and the item in the dictionary corresponding to the "values" key, e.g.

In[17]: template.variables['TEMP']['values']
Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

But you're right that it could be a bit confusing. My preferred alternative names are "array" or "data".

ocehugo · 2018-10-01T23:46:01Z

It looks like it is not possible to write a NetCDF file without any values attached to a variable. see https://github.com/aodn/aodn-netcdf-tools/blob/master/ncwriter/template.py#L410

It's completely fine, because if no values are found the operation is :

ncobj['var'][:] = None

this will fill the array with NaNs. Maybe this is not the "best" though and it's best to leave the values as empty (save space).

Also I don't think the keyword "value" should be used, as it is used by default by python dictionary object. Maybe "var_values" could be used or something else.
The issue with using "value" is that
asattr(template.variables["VAR_TO_TEST"]['attributes'], 'values') always returns true even if there aren't any values attached

Yeap, disambiguation is required because of dicts. I vote for data.

ggalibert · 2018-10-02T05:46:35Z

The use case of a variable with no value is mostly to group some metadata together when that is relevant under a common container variable. See for example https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/timeSeriesOrthogonal.cdl where:

        int platform_variable; //............................................ RECOMMENDED - a container variable storing information about the platform. If more than one, can expand each attribute into a variable. For example, platform_call_sign and platform_ncei_code. See instrument_parameter_variable for an example.
                platform_variable:long_name = "" ; //........................ RECOMMENDED - Provide a descriptive, long name for this variable. 
                platform_variable:comment = "" ; //.......................... RECOMMENDED - Add useful, additional information here.
                platform_variable:call_sign = "" ; //........................ RECOMMENDED - This attribute identifies the call sign of the platform.          
                platform_variable:ncei_code = ""; //......................... RECOMMENDED - This attribute identifies the NCEI code of the platform. Look at http://www.nodc.noaa.gov/cgi-bin/OAS/prd/platform to find if NCEI codes are available.          
                platform_variable:wmo_code = "";//........................... RECOMMENDED - This attribute identifies the wmo code of the platform. Information on getting WMO codes is available at http://www.wmo.int/pages/prog/amp/mmop/wmo-number-rules.html          
                platform_variable:imo_code  = "";//.......................... RECOMMENDED - This attribute identifies the International Maritime Organization (IMO) number assigned by Lloyd's register.

mhidas · 2018-10-02T07:03:41Z

@ggalibert Ok, in that case the variable value is not important, but you could still set it to something. I think that's still a bit different from writing a file without explicitly setting the variables values to something (even if just arbitrary values, fill values or NaNs).

mhidas · 2018-10-02T07:05:15Z

What I'm saying is, if you just do this:

template = DatasetTemplate(dimensions={'X': 10},
                           variables={'X': {'type': 'float', 'dimensions': ['X']}}
                           )
template.to_netcdf('test2.nc')

You've probably just forgotten to set template.variables['X']['values'], so the code should raise an exception.

What does everyone think?

If you actually want it to write the variable with just fill values, all you have to do is add something like this before writing the file:

template.variables['X']['_FillValue'] = -999.
template.variables['X']['values'] = np.full(10, -999.0)

lbesnard · 2018-10-02T07:08:37Z

I don't see why adding this constraint to NetCDF file which is not compulsory in the NetCDF format. We have to keep in mind this tool could be used by others

ggalibert · 2018-10-02T23:28:31Z

You've probably just forgotten to set template.variables['X']['values'], so the code should raise an exception.

I agree, this would force everyone to produce "neat", non unexpected files.

mhidas · 2018-10-03T06:34:55Z

@lbesnard I don't think it's a constraint, really. It's more a safety feature. The main use case for this code is for writing actual data to a file, not fill values. If you want to write fill values, you can, but you have to do so explicitly.

We could include a bit of a shortcut, so that setting template.variables['X']['values'] = None will translate to a variable with fill values (rather than NaNs). You could even set this as a default in your template.

mhidas · 2018-10-03T08:01:33Z

I've added two more commits to #7 in response to the above:

renamed 'values' to 'data';
raise error if 'data' not provided;
all fill values if 'data' is None

mhidas · 2018-10-03T08:02:53Z

Please review and merge if happy.

lbesnard · 2018-10-08T04:03:06Z

@lbesnard I don't think it's a constraint, really. It's more a safety feature.

Well I disagree. The case of guillaume's example is a good one.

mhidas · 2018-10-09T06:54:26Z

And we're still allowing for that case. In fact I've made it even easier now. If you want to create a variable with all fill values, you can even specify it in the template. E.g. in the case of Guillaume's example, all you'd need to do is add this to the variables listed in your template:

        "platform_variable": {
            "type": "int",
            "data": null
        }

Given that this is the less common use case, I think that's an acceptable amount of "overhead" required to make it work.

lbesnard · 2018-10-15T01:11:40Z

@mhidas I don't think this should be closed as it hasn't been decided all together.

What do you do in the case of a string variable ? Do you had NaN ? Won't this fail? I think it actually adds more "overhead" to create logic on all of this by restricting what NetCDF actually allows you to do

ggalibert · 2018-10-15T01:30:20Z

@lbesnard , what is the difference between a variable with a "NaN" or with "nothing" in it?

What do you do in the case of a string variable ? Do you had NaN ?

I suppose when we say "NaN" we mean fill values. It is possible to set _FillValue for any variable type and it works.

ocehugo · 2018-10-16T02:26:04Z

My 2 cents:

In concept, enforcing writing of data is wrong, worse even is raising an exception because of no "data" provided. No data provided is a completely fine action.

Although I was filling the values in the original code, I don't think it should be the default.

If something is not provided, leave empty. If you want to add data, be explicit in the code. And I think the worst time to be explicit about "dynamical" or "data" is at the template itself.

In my ideal world, templates should only be used to provide static attributes, although defining static is somehow complicated in the long run.
I think that providing "fillup data" of any kind in the templates should be prohibited. This avoid typical problems (precision, conversion, etc) and abuse of the tool.

Even some integer/float attributes is not advisable given they can change or inject unwanted values in some cases (say flagvalues are defined at templates but actual values extend beyond that limit).

lbesnard added the bug Something isn't working label Sep 27, 2018

lbesnard changed the title ~~use of attribute value~~ use of attribute "value" Sep 27, 2018

mhidas closed this as completed Oct 11, 2018

lbesnard reopened this Oct 15, 2018

mhidas added this to the v1.0 milestone Oct 16, 2018

mhidas changed the title ~~use of attribute "value"~~ Default behaviour when no data is provided for a variable Nov 14, 2018

mhidas changed the title ~~Default behaviour when no data is provided for a variable~~ ncwriter: Default behaviour when no data is provided for a variable Aug 20, 2019

mhidas changed the title ~~ncwriter: Default behaviour when no data is provided for a variable~~ Default behaviour when no data is provided for a variable Aug 20, 2019

mhidas added the ncwriter Related to creating netCDF files based on templates label Aug 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default behaviour when no data is provided for a variable #4

Default behaviour when no data is provided for a variable #4

lbesnard commented Sep 27, 2018

mhidas commented Oct 1, 2018

ocehugo commented Oct 1, 2018 •

edited

Loading

ggalibert commented Oct 2, 2018

mhidas commented Oct 2, 2018

mhidas commented Oct 2, 2018

lbesnard commented Oct 2, 2018

ggalibert commented Oct 2, 2018

mhidas commented Oct 3, 2018

mhidas commented Oct 3, 2018

mhidas commented Oct 3, 2018

lbesnard commented Oct 8, 2018

mhidas commented Oct 9, 2018

lbesnard commented Oct 15, 2018

ggalibert commented Oct 15, 2018

ocehugo commented Oct 16, 2018 •

edited

Loading

Default behaviour when no data is provided for a variable #4

Default behaviour when no data is provided for a variable #4

Comments

lbesnard commented Sep 27, 2018

mhidas commented Oct 1, 2018

ocehugo commented Oct 1, 2018 • edited Loading

ggalibert commented Oct 2, 2018

mhidas commented Oct 2, 2018

mhidas commented Oct 2, 2018

lbesnard commented Oct 2, 2018

ggalibert commented Oct 2, 2018

mhidas commented Oct 3, 2018

mhidas commented Oct 3, 2018

mhidas commented Oct 3, 2018

lbesnard commented Oct 8, 2018

mhidas commented Oct 9, 2018

lbesnard commented Oct 15, 2018

ggalibert commented Oct 15, 2018

ocehugo commented Oct 16, 2018 • edited Loading

ocehugo commented Oct 1, 2018 •

edited

Loading

ocehugo commented Oct 16, 2018 •

edited

Loading