-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default behaviour when no data is provided for a variable #4
Comments
Is that something you would want to do?
There is a distinction between the values attribute of the dictionary object (which is a method), e,g, In[7]: template.variables['TEMP'].values
Out[7]: <bound method OrderedDict.values of OrderedDict([(u'dimensions', [u'TIME', u'DEPTH']), (u'type', u'float32'), (u'attributes', OrderedDict([(u'standard_name', u'sea_water_temperature'), (u'units', u'degC'), (u'valid_min', 0.0), (u'valid_max', 42.0)]))])> ... and the item in the dictionary corresponding to the "values" key, e.g. In[17]: template.variables['TEMP']['values']
Out[17]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) But you're right that it could be a bit confusing. My preferred alternative names are "array" or "data". |
It's completely fine, because if no values are found the operation is :
this will fill the array with NaNs. Maybe this is not the "best" though and it's best to leave the values as empty (save space).
Yeap, disambiguation is required because of dicts. I vote for data. |
The use case of a variable with no value is mostly to group some metadata together when that is relevant under a common container variable. See for example https://www.nodc.noaa.gov/data/formats/netcdf/v2.0/timeSeriesOrthogonal.cdl where:
|
@ggalibert Ok, in that case the variable value is not important, but you could still set it to something. I think that's still a bit different from writing a file without explicitly setting the variables values to something (even if just arbitrary values, fill values or NaNs). |
What I'm saying is, if you just do this: template = DatasetTemplate(dimensions={'X': 10},
variables={'X': {'type': 'float', 'dimensions': ['X']}}
)
template.to_netcdf('test2.nc') You've probably just forgotten to set What does everyone think? If you actually want it to write the variable with just fill values, all you have to do is add something like this before writing the file: template.variables['X']['_FillValue'] = -999.
template.variables['X']['values'] = np.full(10, -999.0) |
I don't see why adding this constraint to NetCDF file which is not compulsory in the NetCDF format. We have to keep in mind this tool could be used by others |
I agree, this would force everyone to produce "neat", non unexpected files. |
@lbesnard I don't think it's a constraint, really. It's more a safety feature. The main use case for this code is for writing actual data to a file, not fill values. If you want to write fill values, you can, but you have to do so explicitly. We could include a bit of a shortcut, so that setting |
I've added two more commits to #7 in response to the above:
|
Please review and merge if happy. |
Well I disagree. The case of guillaume's example is a good one. |
And we're still allowing for that case. In fact I've made it even easier now. If you want to create a variable with all fill values, you can even specify it in the template. E.g. in the case of Guillaume's example, all you'd need to do is add this to the variables listed in your template: "platform_variable": {
"type": "int",
"data": null
} Given that this is the less common use case, I think that's an acceptable amount of "overhead" required to make it work. |
@mhidas I don't think this should be closed as it hasn't been decided all together. What do you do in the case of a string variable ? Do you had NaN ? Won't this fail? I think it actually adds more "overhead" to create logic on all of this by restricting what NetCDF actually allows you to do |
@lbesnard , what is the difference between a variable with a "NaN" or with "nothing" in it?
I suppose when we say "NaN" we mean fill values. It is possible to set _FillValue for any variable type and it works. |
My 2 cents: In concept, enforcing writing of data is wrong, worse even is raising an exception because of no "data" provided. No data provided is a completely fine action. Although I was filling the values in the original code, I don't think it should be the default. If something is not provided, leave empty. If you want to add data, be explicit in the code. And I think the worst time to be explicit about "dynamical" or "data" is at the template itself. In my ideal world, templates should only be used to provide static attributes, although defining static is somehow complicated in the long run. Even some integer/float attributes is not advisable given they can change or inject unwanted values in some cases (say flagvalues are defined at templates but actual values extend beyond that limit). |
It looks like it is not possible to write a netcdf file without any values attached to a variable.
see
https://github.com/aodn/aodn-netcdf-tools/blob/master/ncwriter/template.py#L410
Also I don't think the keyword "value" should be used, as it is used by default by python dictionary object. Maybe "var_values" could be used or something else.
The issue with using "value" is that
asattr(template.variables["VAR_TO_TEST"]['attributes'], 'values') always returns true even if there aren't any values attached
The text was updated successfully, but these errors were encountered: