Change default filename for cf writer to be compatible with satpy_cf_nc reader #1637

BENR0 · 2021-04-13T09:56:15Z

The default filename pattern of the cf writer is changed to the format the satpy_cf_nc reader expects. Also the "instrument" attribute which the satpy_cf_nc reader expects is now added by the reader.

I have been trying out the satpy_cf_nc reader and stumbled over some things.

The reader expects a certain filename format which is documented. While this is not a big point never the less I think to be consistent the cf writer should by default use the same filename format because as a user I would expect to be able to read a file written with the default settings.
The reader also expects a global attribute instrument in the get_sensor function of the reader. This is currently not documented and is not added by the cf writer but needs to be added by the user by supplying the header_attrs parameter. With the same argument I think the reader should add this.

Apart from this I am wondering why the attribute is named instrument in the satpy_cf_nc reader while everywhere else it seems to be named sensor? Maybe this should be renamed?
Additionally I was wondering why this is retrieved from the global attributes since the reader relies on dynamic datasets anyway and retrieves some attributes anyway in _assign_ds_info which could also be used to get the sensor. I guess the sensor information might not be available in all readers?
However this is not to say that global attributes should not be used. Actually maybe the cf writer should write attributes common to all datasets to the global attributes similar to the to_xarray_dataset scene method?

Note: a lot of the cf writer tests are failing due to the instrument attribute addition. I wanted to hear your thoughts on this before fixing these.

djhoese · 2021-05-15T16:43:24Z

I think the main purpose of this PR has been handled now in some other PR. You can see the current file types here:

satpy/satpy/etc/readers/satpy_cf_nc.yaml

Lines 14 to 15 in 153d868

    
           - '{platform_name}-{sensor}-{resolution_type}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc' 
        
           - '{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'

I think we can close this? If you don't think so, let me know and we can reopen it.

BENR0 · 2021-05-17T07:26:40Z

Well I would have to see the PR. The file type patterns above still don't match the patterns the cf writer uses by default. Additionally the cf reader needs the instrument attribute in the metadata of the netcdf which also does not get written by default. So I think it would be good to keep this open.

djhoese · 2021-05-17T15:51:18Z

Here's what your (this) PR has:

'{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'

And here's what is there currently:

'{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'

I think those are exactly the same, right?

BENR0 · 2021-05-17T17:53:12Z

:-D yes exactly the filename pattern from the reader the and writer should be the same. With my changes they are. Currently the filename pattern of the writer is:

satpy/satpy/etc/writers/cf.yaml

Line 5 in 62c0e83

filename: '{name}_{start_time:%Y%m%d_%H%M%S}.nc'

djhoese · 2021-05-17T18:05:57Z

Oh! 🤦 Your changes are to the writer. @TAlonglong @martin @sfinkens what do you think? Should the CF writer's default output filename be the more complex name in this PR (see my comment above) or should the CF reader be updated to also match the current simpler writer default filename (see above).

mraspaud · 2021-05-17T18:55:43Z

Yes, the more complete name is better imo

sfinkens · 2021-05-18T08:08:06Z

Thanks for looking into this @BENR0! I agree that the more complete name is better.

Regarding the attribute name: sensor is certainly more consistent in the Satpy world. You could argue that the netCDF file is for people from the CF community. And they are probably more familiar with instrument. At least that is what's recommended by ACDD. I have a tendency to keep the instrument name.

Thinking a bit further: There's actually a standardized vocabulary for platforms, instruments etc (GCMD). For example you would specify

platform = "Earth Observation Satellites > METEOSAT > METEOSAT-9"
instrument = "Earth Remote Sensing Instruments > Passive Remote Sensing > Spectrometers/Radiometers > Imaging Spectrometers/Radiometers > SEVIRI > Spinning Enhanced Visible and Infrared Imager"
instrument_vocabulary = "GCMD Instruments, Version 8.1"
platform_vocabulary = "GCMD Platforms, Version 8.1"

I think it would be nice if all the readers followed that convention, if possible. But that's clearly beyond the scope of this PR.

BENR0 · 2021-05-18T08:41:54Z

@sfinkens thanks for the info.

Some more thoughts from my side about the instrument attribute. In this PR I just added writing of this attribute to the cf writer because the cf reader complains otherwise. That said from what I grasp the cf reader does not really depend on that information itself but just for the sensor_names property. So maybe the cf reader should be changed to not need the instrument attribute?
If the writer should add that attribute it would be necessary to ensure that every dataset has it (I am not sure if currently it is possible that a dataset does not have it?).
Furthermore in this PR the sensor attribute of the first dataset is used. It might be necessary to refactor that to account for cases where not all datasets in the scene are from the same sensor.

sfinkens · 2021-05-18T14:30:25Z

In this PR I just added writing of this attribute to the cf writer because the cf reader complains otherwise

Yes that's good, reader and writer should be consistent!

So maybe the cf reader should be changed to not need the instrument attribute?

The sensor_names property is prescribed by the base class so we better keep it :)

Furthermore in this PR the sensor attribute of the first dataset is used. It might be necessary to refactor that to account for cases where not all datasets in the scene are from the same sensor.

I think that's fine for now.

BENR0 · 2022-05-25T09:20:38Z

@djhoese @mraspaud @sfinkens working on #1695 I stumbled about this problem again and I think this is still relevant. Especially the instrument attribute is not very intuitive to users. For the default filenames in the reader I think it would also be good to even add {filename}.nc to the list like for the generic_image reader since a user might just want to save to a custom name which he can not read back in since the cf reader does not find any files.

BENR0 · 2022-08-16T09:31:08Z

@djhoese @mraspaud @sfinkens this still is an issue. I have merged the current main again and pushed to this branch but since this PR got closed it does not show up. Should I open another PR?

Just saw #2176 which is related to this PR regarding the missing instrument.

djhoese

Thanks for working on updating this again. I've re-opened the PR. Looks like I closed it and forgot I had done that even though the conversation clearly led to it needing to be reopened.

I've made a few comments and things I'm questioning. There are few things in the writer that I don't think are needed anymore. Sorry for taking so long to get to this PR that part of it were handled by others.

satpy/writers/cf_writer.py

satpy/etc/readers/satpy_cf_nc.yaml

djhoese · 2022-08-22T13:52:33Z

Not sure how but it seems the tests are failing because pygrib is missing?

djhoese · 2022-08-22T13:55:49Z

Oh the pygrib warnings can be ignored.

The problem is that group_files in the failing test is matching the CF reader instead of the abi_l1b reader. I think the reader could be specified on the group files call. Do you want to try adding reader='abi_l1b' to that test.

 ________________________ TestGroupFiles.test_no_reader _________________________

self = <satpy.tests.test_readers.TestGroupFiles testMethod=test_no_reader>

    def test_no_reader(self):
        """Test that reader does not need to be provided."""
        from satpy.readers import group_files
    
        # without files it's going to be an empty result
        assert group_files([]) == []
        groups = group_files(self.g16_files)
>       self.assertEqual(6, len(groups))
E       AssertionError: 6 != 1

satpy/tests/test_readers.py:723: AssertionError

codecov · 2022-08-22T15:44:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.94%. Comparing base (417c768) to head (82b858e).
Report is 390 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1637      +/-   ##
==========================================
- Coverage   95.94%   95.94%   -0.01%     
==========================================
  Files         366      366              
  Lines       53613    53504     -109     
==========================================
- Hits        51441    51332     -109     
  Misses       2172     2172

Flag	Coverage Δ
behaviourtests	`4.04% <ø> (+<0.01%)`	⬆️
unittests	`96.03% <ø> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coveralls · 2022-08-22T15:58:56Z

Coverage remained the same at 94.747% when pulling e82d4fd on BENR0:feat_default_filename_cf_writer into 5cc09e0 on pytroll:main.

BENR0 · 2022-10-13T10:57:00Z

@djhoese @mraspaud just looking through some PR's to clean up some open stuff. Is there something else than merging main again which needs to be done here or can this be merged?

gerritholl · 2022-10-13T11:47:33Z

@djhoese @mraspaud just looking through some PR's to clean up some open stuff. Is there something else than merging main again which needs to be done here or can this be merged?

It's marked as draft. Draft pull requests cannot be merged.

gerritholl · 2023-01-05T09:59:53Z

satpy/etc/readers/satpy_cf_nc.yaml

@@ -17,3 +17,4 @@ file_types:
        file_patterns:
         - '{platform_name}-{sensor}-{resolution_type}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'
         - '{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'
+         - '{filename}.nc'


For consistency with the GRIB2 reader, could this be {stem}.nc? stem would also be more accurate, as .nc is part of the filename.

@gerritholl I would be ok with that. I just added filename based on the generic_image reader config.

fine by me.

I vote for stem also. The generic_image can be updated later.

mraspaud

Thanks for this PR, I just have a question regarding the group file test.

mraspaud · 2023-01-09T12:01:10Z

satpy/tests/test_readers.py

@@ -751,7 +751,7 @@ def test_no_reader(self):

        # without files it's going to be an empty result
        assert group_files([]) == []
-        groups = group_files(self.g16_files)
+        groups = group_files(self.g16_files, reader='abi_l1b')


I'm not sure I understand what this has to do with this PR. Moreover, the docstring of this test says that the we want to check that we can skip the reader argument, and with change does the opposite?

@BENR0 can you comment on this?

@mraspaud what you are saying makes sense to me. I honestly didn't check what that test does I just made this change based on @djhoese suggestion so maybe he can shed more light on this.

Whoa I did not realize what the test was actually trying to accomplish. This is one downside of the generic filename in the reader. It matches everything and has the change to "steal" matches from other readers. Either:

This test gets updated to expect 1 group of files (not great).

The reader gets updated with group keys to sort by start time...which might work and fix this test.

Remove the default file pattern from the reader

Well obviously I am against option 3. :-D. I think removing a "feature" because a test fails is not the way to go. I am not sure which of the remaining options is the best way to move forward.

Satpy needs a strategy to choose a reader if multiple readers match a file pattern. I think the current strategy is that the first reader with a match wins, but the order of readers is not explicitly defined?

Reader order may come down to:

satpy/satpy/readers/__init__.py

Line 328 in c0b69be

reader_configs = glob_config(os.path.join('readers', '*.yaml'), search_dirs=paths)

in most cases. There may be others. It may get even more complicated when we look at that glob_configs function since it is searching not only Satpy's configs but any user configured paths too:

satpy/satpy/_config.py

Lines 156 to 184 in c0b69be

def config_search_paths(filename, search_dirs=None, **kwargs):

"""Get series of configuration base paths where Satpy configs are located."""

if search_dirs is None:

search_dirs = get_config_path_safe()[::-1]

paths = [filename, os.path.basename(filename)]

paths += [os.path.join(search_dir, filename) for search_dir in search_dirs]

paths += [os.path.join(PACKAGE_CONFIG_PATH, filename)]

paths = [os.path.abspath(path) for path in paths]

if kwargs.get("check_exists", True):

paths = [x for x in paths if os.path.isfile(x)]

paths = list(OrderedDict.fromkeys(paths))

# flip the order of the list so builtins are loaded first

return paths[::-1]

def glob_config(pattern, search_dirs=None):

"""Return glob results for all possible configuration locations.

Note: This method does not check the configuration "base" directory if the pattern includes a subdirectory.

This is done for performance since this is usually used to find *all* configs for a certain component.

"""

patterns = config_search_paths(pattern, search_dirs=search_dirs,

check_exists=False)

for pattern_fn in patterns:

for path in glob.iglob(pattern_fn):

yield path

Either way, it clearly uses iglob which isn't going to be sorted.

As for my option 3 @BENR0, it isn't that the test fails, it is that the failing test represents a use case (as @gerritholl says): which reader gets chosen when a user doesn't specify a reader (they don't know what one they want). With this PR they seem to always get the CF reader even though there is a much better choice.

@gerritholl besides alphabetical order, I'm not sure we want to go down the performance-lossy road of checking all readers for a match against a filename. We could implement something that knows that the CF reader and the generic image reader should be a last resort:

# pseudo code generic_choice = None for reader, reader_info in ...: if reader.matches(filename): if not reader_info["generic"]: return reader generic_choice = reader return generic_choice

I understand. This discussion reminded me of the conversation regarding overlapping file patterns in #913 which might be related.

Similar, but they are separate bits of code. The overlapping file patterns will likely always be a little difficult. This reader choice one can have a solution, but it might slow some operations down.

I exchanged "filename" for "stem" which obviously was the easy part. @djhoese @mraspaud @sfinkens @gerritholl What is the proposed way, based on the discussion above, to move this forward?

mraspaud · 2023-01-09T12:02:19Z

satpy/etc/readers/satpy_cf_nc.yaml

@@ -17,3 +17,4 @@ file_types:
        file_patterns:
         - '{platform_name}-{sensor}-{resolution_type}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'
         - '{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'
+         - '{filename}.nc'


fine by me.

coveralls · 2023-05-25T13:12:41Z

Pull Request Test Coverage Report for Build 9516196907

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage remained the same at 96.04%

Totals
Change from base Build 9513521871:	0.0%
Covered Lines:	51561
Relevant Lines:	53687

💛 - Coveralls

djhoese · 2023-05-25T13:21:02Z

satpy/etc/writers/cf.yaml

@@ -2,6 +2,6 @@ writer:
  name: cf
  description: Generic netCDF4/CF Writer
  writer: !!python/name:satpy.writers.cf_writer.CFWriter
-  filename: '{name}_{start_time:%Y%m%d_%H%M%S}.nc'
+  filename: '{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'


Not sure it was mentioned in the comments, but does anyone know/remember where this - separate filename came from? As you mentioned it seems to have existed in the reader first, but it isn't clear to me why/where from.

I think my question here still stands, but not sure we're going to get anyone who has a real answer.

djhoese · 2023-05-25T14:22:51Z

I don't use find_files_and_readers so one of the other maintainers or users will have to decide on the {stem}.nc file pattern added to the reader. Since this would match any netcdf file input I could see it breaking a lot of users workflows so I'm hesitant to keep it. The rest of this PR I think I'm fine with.

BENR0 · 2024-06-14T12:36:00Z

I removed the stem from the filename patterns for the reader again because at this point it is causing to much trouble and should be solved in a separate PR. I will add an issue. This is ready to go then.

djhoese

LGTM

djhoese · 2024-06-17T18:30:11Z

satpy/etc/writers/cf.yaml

@@ -2,6 +2,6 @@ writer:
  name: cf
  description: Generic netCDF4/CF Writer
  writer: !!python/name:satpy.writers.cf_writer.CFWriter
-  filename: '{name}_{start_time:%Y%m%d_%H%M%S}.nc'
+  filename: '{platform_name}-{sensor}-{start_time:%Y%m%d%H%M%S}-{end_time:%Y%m%d%H%M%S}.nc'


I think my question here still stands, but not sure we're going to get anyone who has a real answer.

BENR0 added 2 commits April 12, 2021 16:13

refactor: default filename pattern

5f8d072

refactor: add instrument to root attrs

6666c4b

djhoese closed this May 15, 2021

Merge branch 'main' into feat_default_filename_cf_writer

1898acc

refactor: add generic filename to reader yaml

f123e23

djhoese reopened this Aug 19, 2022

djhoese added enhancement code enhancements, features, improvements component:readers labels Aug 19, 2022

djhoese requested changes Aug 19, 2022

View reviewed changes

satpy/writers/cf_writer.py Outdated Show resolved Hide resolved

satpy/writers/cf_writer.py Outdated Show resolved Hide resolved

satpy/etc/readers/satpy_cf_nc.yaml Outdated Show resolved Hide resolved

BENR0 added 2 commits August 22, 2022 10:21

refactor: remove adding instrument to root attrs

fd2669e

fix: remove unnecessary code introduced by merge

612b738

djhoese approved these changes Aug 22, 2022

View reviewed changes

fix: add reader name to test

c9476f5

BENR0 mentioned this pull request Oct 7, 2022

satpy_cf_nc reader could be more liberal in filename acceptance #2223

Open

Merge branch 'main' into feat_default_filename_cf_writer

e82d4fd

BENR0 marked this pull request as ready for review October 13, 2022 13:48

BENR0 requested a review from mraspaud as a code owner October 13, 2022 13:48

gerritholl reviewed Jan 5, 2023

View reviewed changes

mraspaud reviewed Jan 9, 2023

View reviewed changes

mraspaud assigned BENR0 Jan 23, 2023

BENR0 added 2 commits May 25, 2023 14:27

Merge branch 'main' into feat_default_filename_cf_writer

8d076ac

refactor: change filename to stem

a3af13d

djhoese reviewed May 25, 2023

View reviewed changes

refactor: remove stem.nc from yaml config again

82b858e

djhoese approved these changes Jun 17, 2024

View reviewed changes

mraspaud merged commit cade277 into pytroll:main Jun 24, 2024
18 of 19 checks passed

	def config_search_paths(filename, search_dirs=None, **kwargs):
	"""Get series of configuration base paths where Satpy configs are located."""
	if search_dirs is None:
	search_dirs = get_config_path_safe()[::-1]

	paths = [filename, os.path.basename(filename)]
	paths += [os.path.join(search_dir, filename) for search_dir in search_dirs]
	paths += [os.path.join(PACKAGE_CONFIG_PATH, filename)]
	paths = [os.path.abspath(path) for path in paths]

	if kwargs.get("check_exists", True):
	paths = [x for x in paths if os.path.isfile(x)]

	paths = list(OrderedDict.fromkeys(paths))
	# flip the order of the list so builtins are loaded first
	return paths[::-1]


	def glob_config(pattern, search_dirs=None):
	"""Return glob results for all possible configuration locations.

	Note: This method does not check the configuration "base" directory if the pattern includes a subdirectory.
	This is done for performance since this is usually used to find all configs for a certain component.
	"""
	patterns = config_search_paths(pattern, search_dirs=search_dirs,
	check_exists=False)
	for pattern_fn in patterns:
	for path in glob.iglob(pattern_fn):
	yield path

Change default filename for cf writer to be compatible with satpy_cf_nc reader #1637

Change default filename for cf writer to be compatible with satpy_cf_nc reader #1637

Conversation

BENR0 commented Apr 13, 2021

djhoese commented May 15, 2021

BENR0 commented May 17, 2021

djhoese commented May 17, 2021

BENR0 commented May 17, 2021

djhoese commented May 17, 2021

mraspaud commented May 17, 2021

sfinkens commented May 18, 2021

BENR0 commented May 18, 2021

sfinkens commented May 18, 2021

BENR0 commented May 25, 2022

BENR0 commented Aug 16, 2022 • edited Loading

djhoese left a comment

Choose a reason for hiding this comment

djhoese commented Aug 22, 2022

djhoese commented Aug 22, 2022

codecov bot commented Aug 22, 2022 • edited Loading

Codecov Report

coveralls commented Aug 22, 2022 • edited Loading

BENR0 commented Oct 13, 2022

gerritholl commented Oct 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mraspaud left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gerritholl Feb 3, 2023 • edited Loading

Choose a reason for hiding this comment

djhoese Feb 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

coveralls commented May 25, 2023 • edited Loading

Pull Request Test Coverage Report for Build 9516196907

Details

💛 - Coveralls

Choose a reason for hiding this comment

Choose a reason for hiding this comment

djhoese commented May 25, 2023

BENR0 commented Jun 14, 2024

djhoese left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BENR0 commented Aug 16, 2022 •

edited

Loading

codecov bot commented Aug 22, 2022 •

edited

Loading

coveralls commented Aug 22, 2022 •

edited

Loading

gerritholl Feb 3, 2023 •

edited

Loading

djhoese Feb 3, 2023 •

edited

Loading

coveralls commented May 25, 2023 •

edited

Loading