Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default behaviour when no data is provided for a variable #4

Open
lbesnard opened this issue Sep 27, 2018 · 15 comments
Open

Default behaviour when no data is provided for a variable #4

lbesnard opened this issue Sep 27, 2018 · 15 comments
Labels
bug Something isn't working ncwriter Related to creating netCDF files based on templates
Milestone

Comments

@lbesnard
Copy link
Contributor

It looks like it is not possible to write a netcdf file without any values attached to a variable.
see
https://github.com/aodn/aodn-netcdf-tools/blob/master/ncwriter/template.py#L410

Also I don't think the keyword "value" should be used, as it is used by default by python dictionary object. Maybe "var_values" could be used or something else.
The issue with using "value" is that
asattr(template.variables["VAR_TO_TEST"]['attributes'], 'values') always returns true even if there aren't any values attached

@lbesnard lbesnard added the bug Something isn't working label Sep 27, 2018
@lbesnard lbesnard changed the title use of attribute value use of attribute "value" Sep 27, 2018
@mhidas
Copy link
Contributor

mhidas commented Oct 1, 2018

It looks like it is not possible to write a netcdf file without any values attached to a variable.

Is that something you would want to do?
We could write a variable filled with fill values by default, I guess. However unless there's a good use case requiring that, I think this would indicate the values were not set by mistake, so the code should raise an exception.

Also I don't think the keyword "value" should be used, as it is used by default by python dictionary object.

There is a distinction between the values attribute of the dictionary object (which is a method), e,g,

In[7]: template.variables['TEMP'].values
Out[7]: <bound method OrderedDict.values of OrderedDict([(u'dimensions', [u'TIME', u'DEPTH']), (u'type', u'float32'), (u'attributes', OrderedDict([(u'standard_name', u'sea_water_temperature'), (u'units', u'degC'), (u'valid_min', 0.0), (u'valid_max', 42.0)]))])>

... and the item in the dictionary corresponding to the "values" key, e.g.

In[17]: template.variables['TEMP']['values']
Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

But you're right that it could be a bit confusing. My preferred alternative names are "array" or "data".

@ocehugo
Copy link
Contributor

ocehugo commented Oct 1, 2018

It looks like it is not possible to write a NetCDF file without any values attached to a variable. see https://github.com/aodn/aodn-netcdf-tools/blob/master/ncwriter/template.py#L410

It's completely fine, because if no values are found the operation is :

ncobj['var'][:] = None

this will fill the array with NaNs. Maybe this is not the "best" though and it's best to leave the values as empty (save space).

Also I don't think the keyword "value" should be used, as it is used by default by python dictionary object. Maybe "var_values" could be used or something else.
The issue with using "value" is that
asattr(template.variables["VAR_TO_TEST"]['attributes'], 'values') always returns true even if there aren't any values attached

Yeap, disambiguation is required because of dicts. I vote for data.

@ggalibert
Copy link
Contributor

The use case of a variable with no value is mostly to group some metadata together when that is relevant under a common container variable. See for example https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/timeSeriesOrthogonal.cdl where:

        int platform_variable; //............................................ RECOMMENDED - a container variable storing information about the platform. If more than one, can expand each attribute into a variable. For example, platform_call_sign and platform_ncei_code. See instrument_parameter_variable for an example.
                platform_variable:long_name = "" ; //........................ RECOMMENDED - Provide a descriptive, long name for this variable. 
                platform_variable:comment = "" ; //.......................... RECOMMENDED - Add useful, additional information here.
                platform_variable:call_sign = "" ; //........................ RECOMMENDED - This attribute identifies the call sign of the platform.          
                platform_variable:ncei_code = ""; //......................... RECOMMENDED - This attribute identifies the NCEI code of the platform. Look at http://www.nodc.noaa.gov/cgi-bin/OAS/prd/platform to find if NCEI codes are available.          
                platform_variable:wmo_code = "";//........................... RECOMMENDED - This attribute identifies the wmo code of the platform. Information on getting WMO codes is available at http://www.wmo.int/pages/prog/amp/mmop/wmo-number-rules.html          
                platform_variable:imo_code  = "";//.......................... RECOMMENDED - This attribute identifies the International Maritime Organization (IMO) number assigned by Lloyd's register.

@mhidas
Copy link
Contributor

mhidas commented Oct 2, 2018

@ggalibert Ok, in that case the variable value is not important, but you could still set it to something. I think that's still a bit different from writing a file without explicitly setting the variables values to something (even if just arbitrary values, fill values or NaNs).

@mhidas
Copy link
Contributor

mhidas commented Oct 2, 2018

What I'm saying is, if you just do this:

template = DatasetTemplate(dimensions={'X': 10},
                           variables={'X': {'type': 'float', 'dimensions': ['X']}}
                           )
template.to_netcdf('test2.nc')

You've probably just forgotten to set template.variables['X']['values'], so the code should raise an exception.

What does everyone think?

If you actually want it to write the variable with just fill values, all you have to do is add something like this before writing the file:

template.variables['X']['_FillValue'] = -999.
template.variables['X']['values'] = np.full(10, -999.0)

@lbesnard
Copy link
Contributor Author

lbesnard commented Oct 2, 2018

I don't see why adding this constraint to NetCDF file which is not compulsory in the NetCDF format. We have to keep in mind this tool could be used by others

@ggalibert
Copy link
Contributor

You've probably just forgotten to set template.variables['X']['values'], so the code should raise an exception.

I agree, this would force everyone to produce "neat", non unexpected files.

@mhidas
Copy link
Contributor

mhidas commented Oct 3, 2018

@lbesnard I don't think it's a constraint, really. It's more a safety feature. The main use case for this code is for writing actual data to a file, not fill values. If you want to write fill values, you can, but you have to do so explicitly.

We could include a bit of a shortcut, so that setting template.variables['X']['values'] = None will translate to a variable with fill values (rather than NaNs). You could even set this as a default in your template.

@mhidas
Copy link
Contributor

mhidas commented Oct 3, 2018

I've added two more commits to #7 in response to the above:

  • renamed 'values' to 'data';
  • raise error if 'data' not provided;
  • all fill values if 'data' is None

@mhidas
Copy link
Contributor

mhidas commented Oct 3, 2018

Please review and merge if happy.

@lbesnard
Copy link
Contributor Author

lbesnard commented Oct 8, 2018

@lbesnard I don't think it's a constraint, really. It's more a safety feature.

Well I disagree. The case of guillaume's example is a good one.

@mhidas
Copy link
Contributor

mhidas commented Oct 9, 2018

And we're still allowing for that case. In fact I've made it even easier now. If you want to create a variable with all fill values, you can even specify it in the template. E.g. in the case of Guillaume's example, all you'd need to do is add this to the variables listed in your template:

        "platform_variable": {
            "type": "int",
            "data": null
        }

Given that this is the less common use case, I think that's an acceptable amount of "overhead" required to make it work.

@mhidas mhidas closed this as completed Oct 11, 2018
@lbesnard
Copy link
Contributor Author

@mhidas I don't think this should be closed as it hasn't been decided all together.

What do you do in the case of a string variable ? Do you had NaN ? Won't this fail? I think it actually adds more "overhead" to create logic on all of this by restricting what NetCDF actually allows you to do

@lbesnard lbesnard reopened this Oct 15, 2018
@ggalibert
Copy link
Contributor

@lbesnard , what is the difference between a variable with a "NaN" or with "nothing" in it?

What do you do in the case of a string variable ? Do you had NaN ?

I suppose when we say "NaN" we mean fill values. It is possible to set _FillValue for any variable type and it works.

@ocehugo
Copy link
Contributor

ocehugo commented Oct 16, 2018

My 2 cents:

In concept, enforcing writing of data is wrong, worse even is raising an exception because of no "data" provided. No data provided is a completely fine action.

Although I was filling the values in the original code, I don't think it should be the default.

If something is not provided, leave empty. If you want to add data, be explicit in the code. And I think the worst time to be explicit about "dynamical" or "data" is at the template itself.

In my ideal world, templates should only be used to provide static attributes, although defining static is somehow complicated in the long run.
I think that providing "fillup data" of any kind in the templates should be prohibited. This avoid typical problems (precision, conversion, etc) and abuse of the tool.

Even some integer/float attributes is not advisable given they can change or inject unwanted values in some cases (say flagvalues are defined at templates but actual values extend beyond that limit).

@mhidas mhidas added this to the v1.0 milestone Oct 16, 2018
@mhidas mhidas changed the title use of attribute "value" Default behaviour when no data is provided for a variable Nov 14, 2018
@mhidas mhidas changed the title Default behaviour when no data is provided for a variable ncwriter: Default behaviour when no data is provided for a variable Aug 20, 2019
@mhidas mhidas changed the title ncwriter: Default behaviour when no data is provided for a variable Default behaviour when no data is provided for a variable Aug 20, 2019
@mhidas mhidas added the ncwriter Related to creating netCDF files based on templates label Aug 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ncwriter Related to creating netCDF files based on templates
Projects
None yet
Development

No branches or pull requests

4 participants