Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update STAC Metadata with Proposal #77

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Conversation

zacharyDez
Copy link
Collaborator

What I Changed

  • Added a STAC item for the space2stats_population_2020 dataset, representing population data in H3 hexagons.
  • Implemented the Scientific Extension, introducing the sci:citation attribute to the item properties. We also leave room for future inclusion of sci:doi and sci:publications when applicable.
  • Incorporated the Themes Extension, tagging the item with themes: ["Demographics", "Population"].
  • Updated the catalog.json to link to the new STAC item, creating a 1:N relationship between the catalog and items, where each item corresponds to a specific data source.
  • Moved metadata from sources.json into each STAC item for a more streamlined and centralized structure.

How to Test It

  • Run stac-check to validate the new item against the STAC specification and extensions (Table, and Scientific).
  • Perform a visual inspection to ensure the metadata is clear and aligns with the intended schema.

How I Changed it

Structure

  • Catalog and Items: We propose maintaining a single catalog but introducing individual STAC items for each data source. There is a 1:N relationship between the catalog and its items.
  • Metadata in Items: Information previously contained in sources.json is now directly embedded within each item, making each item self-contained and easier to manage.
  • Single Data Source per Item: Each STAC item represents one data source, simplifying the organization and management of metadata.
  • Multiple Themes per Item: Each item can have multiple themes (e.g., "Demographics," "Population"), allowing for efficient searching across items by theme.

Scientific Extension

  • We added sci:citation to the item properties. In future updates, we can introduce additional scientific metadata fields such as sci:doi and sci:publications if relevant to the datasets.

Themes Extension and Future Considerations

  • While we’ve chosen a simple implementation of the Themes Extension for now, using basic tags, we may adopt a full Knowledge Organization System (KOS) in the future if the scale becomes problematic for managing themes across datasets.

@zacharyDez
Copy link
Collaborator Author

@andresfchamorro; I added a collection based on recommendations from some STAC folks internally. They also recommended I test it out with the pystac client:

import pystac

catalog = pystac.Catalog.from_file("catalog.json")
collection = catalog.get_child('space2stats-collection')
items = list(collection.get_items())

for item in items:
    print(f"Item ID: {item.id}")
    print(f"Item Properties: {item.properties}")

Works as expected, and stac-checks are all passing with no additional recommendations.

@zacharyDez
Copy link
Collaborator Author

Created a file with the total population sum (total, m, f): s3://wbg-geography01/Space2Stats/parquet/GLOBAL/space2stats.parquet

@andresfchamorro
Copy link
Collaborator

thanks @zacharyDez! I really like this structure, specially how the content from sources can be managed at the item level. I'm ok with adding a collection level too.

One thing I'm missing is what is the current approach to build the stac files. I see some tests were added but is there a function that would help to create the json files from scratch? I was using a notebook which was a bit janky but at least it could be re-run as the dataset expands.

@zacharyDez
Copy link
Collaborator Author

@andresfchamorro ; I could help migrate your workflow to python scripts that could be unittested. pystac would probably be my recommendation for creating the STAC catalog, collections and items.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants