Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using STAC for data discovery? #64

Closed
m-mohr opened this issue Mar 27, 2018 · 13 comments
Closed

Using STAC for data discovery? #64

m-mohr opened this issue Mar 27, 2018 · 13 comments
Milestone

Comments

@m-mohr
Copy link
Member

m-mohr commented Mar 27, 2018

Simon from Google drew my attention to STAC (SpatioTemporal Asset Catalog) and STAM (SpatioTemporal Asset Metadata). Both are new evolving standards, we could take them into consideration for data discovery. At least it's a good source to check our own standard against, e.g. to add a license field to our data set description. They use JSON structures, too, which makes them fit better into our own JSON based API. I'll go through them now and see what we might want to adopt.

@m-mohr
Copy link
Member Author

m-mohr commented Mar 27, 2018

STAM seems to be mostly about individual images. That doesn't really fit our purpose. Nevertheless, there are some ideas I got from STAM:

  • We should include an optional license field. Should be standardized based on a pre-defined list, e.g. https://opensource.org/licenses/alphabetical or SPDX License List.
  • We might want to include an optional field to trace back the original data from derived data. At least for me this sounds like an interesting idea, taken from Derived Data Traceability radiantearth/stam-spec#13 .
  • Could be useful to optionally add the platform, e.g. "landsat-8", "sentinel-2a", ..., and maybe the sensor, e.g. "modis", "aster", .... This allows better ways to compare data. Currently the information is hidden in the name/description.

@m-mohr m-mohr changed the title Using STAM and STAC for data discovery Using STAM and STAC for data discovery? Mar 27, 2018
@m-mohr
Copy link
Member Author

m-mohr commented Mar 27, 2018

STAC seems to be too much of a catalogue. For example, it requires to list asset objects that can be downloaded, which is not suitable for us. Same might apply for thumbnails.

What they are missing are band descriptions. What we are missing are properties like cloud cover, resolution, etc. (see their spec.)

Seems like adopting these two standards will be not sufficient for us...

@nuest
Copy link

nuest commented Mar 27, 2018

If you want a useful license field, I can recommend the data from the Open Licenses Service, nice JSON with identifiers (SPDX) and names: http://licenses.opendefinition.org/

Alternatively: https://github.com/sindresorhus/spdx-license-list

@edzer
Copy link
Member

edzer commented Mar 27, 2018

Maybe ask @cholmes: Chris, do you know whether STAC/STAM provide data descriptions at the collection level?

@m-mohr
Copy link
Member Author

m-mohr commented Mar 27, 2018

Simon just mentioned that some eo-related changes to STAC are in the dev branch, including bands. See this example, which includes band information. Nevertheless, it's still all about files and not image collections?!

Another interesting option is the OpenSearch EO Extension with its new GeoJSON Encoding as mentioned in #68.

@cholmes
Copy link

cholmes commented Mar 27, 2018

The landsat example is out of date. We discussed a plan for bands, and @matthewhanson is working on the EO profile. It will have a more sensible approach for bands, and it'd be great to collaborate with you all on it. The notes on what we're planning to do are at https://github.com/radiantearth/community-sprints/blob/master/03072018-ft-collins-co/notes/stac-eo.md#asset-definition

Though it's just a rough sketch, I'd wait till Matt is able to get up the EO profile, which really necessitates the band descriptions and other collection level stuff. Our goal is really search of actual data, but we want to not repeat all the collection data at an individual record level, so it will have a place where that common stuff can be defined.

Not sure what you mean by 'it's still all about files and not image collections' - it's links to assets, those don't have to be actual files online, but it is a reference to something that you can download. Or does Open-eo just provide search at the collection level, not individual collects? How do you actually acquire an image?

I saw open-eo presented last week at the OGC meeting, looks like a great effort. And would be great to align, though may be better to talk than try to figure it out in tickets.

I've looked at the OpenSearch EO Extension GeoJSON encoding, there's decent documentation of it. My hope is that we can use JSON-LD type structures to share some of the naming, though we'll likely do a small subset of all that they have.

@cholmes
Copy link

cholmes commented Mar 27, 2018

And +1 to SPDX license list, that's what we specified. Though the one downside is that it doesn't give much guidance for non open licenses, and we do want to expose that data for search as well.

@cholmes
Copy link

cholmes commented Mar 27, 2018

Do you all have band descriptions? We'd be happy to share definitions on those, are just going to put the first version of that out soon. We definitely need them for the EO profile.

@edzer
Copy link
Member

edzer commented Mar 27, 2018

Thanks! +1 on discussing in person first; will send you an email.

@m-mohr
Copy link
Member Author

m-mohr commented Mar 28, 2018

Good mordning @cholmes ,
thanks for all the information, highly appreciated. I looked through the examples and meeting notes and it is now much clearer where you are heading to and it seems to be the right direction, also for our use case. The old examples gave me a wrong impression, I think. Looking forward to discuss things in person.

@m-mohr m-mohr changed the title Using STAM and STAC for data discovery? Using STAC for data discovery? Mar 29, 2018
@m-mohr m-mohr added this to the v0.4 milestone Apr 4, 2018
@m-mohr
Copy link
Member Author

m-mohr commented Apr 12, 2018

STAC made good progress and released version 0.4. At the moment it is not yet at a stage which fulfills our requirements, but there are ideas and plans that would allow us to use STAC. See radiantearth/stac-spec#81 for an important discussion.

Unfortunately, there is this part of the spec: "All static catalogs must contain at least 1 Asset, as the point of the SpatioTemporal Asset Catalog is to be link to actual actual data, not to just reference metadata (though it is not required that all users have permissions to access the asset).". We are currently just referencing metadata. Providers might want to link to their assets, but some might not want to do that.

@m-mohr
Copy link
Member Author

m-mohr commented Aug 13, 2018

Another idea is to be compatible with WFS3.0, which is also adapted by STAC. This would mean to rename /data to /collections, but that shouldn't be a problem. Example: https://cmr-stac-api.dev.element84.com/docs/index.html That document also has some other nice ideas, e.g. how to define the temporal reference system etc.

@m-mohr
Copy link
Member Author

m-mohr commented Sep 4, 2018

Will be handled with #114.

@m-mohr m-mohr closed this as completed Sep 4, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants