Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create (filtered) datacatalogs of harvested datasetdescriptions #858

Open
coret opened this issue Jan 25, 2024 · 3 comments
Open

Create (filtered) datacatalogs of harvested datasetdescriptions #858

coret opened this issue Jan 25, 2024 · 3 comments

Comments

@coret
Copy link
Contributor

coret commented Jan 25, 2024

The set of harvested and converted datasetdescriptions could also be published as one or more "NDE Datasetregister Catalogs" (in DCAT as that the model we use in the triplestore). One catalog could just be the set of all datasets (=unfiltered).

This could possibly also benefit aggregators/harvesters like CLARIAH, Europeana and data.overheid.nl, where we introduce filters to limit datasets in a catalog, eg. on (a set of) publisher(s). The catalog could be "published" via the Datasetregister API or static files, where the results in the format of a DCAT Data Catalogus are the result of a SPARQL query with some configured filter (maybe via .rq files?).

These datacatalogs increase the findability of heritage datasets.

@coret
Copy link
Contributor Author

coret commented Jan 25, 2024

Another way to make the filters is to let dataset publishers define the audience in the datasetdescription. This way the Dataset Register could make an audience specific datacatalog (for Europeana, CLARIAH, DONL, ...).

@ddeboer
Copy link
Member

ddeboer commented Jan 25, 2024

Note that we currently assume a dataset is part of a single catalog. If I understand you correctly, the relation dataset–catalog would become many–many.

@coret
Copy link
Contributor Author

coret commented Jan 26, 2024

Note that we currently assume a dataset is part of a single catalog.

Where does this assumption come from, or where is this coded?
A heritage organisation can register one or more data catalogs, these do not have to be disjoint in terms of datasets. I can imagine that when a dataset from catalog B is processed, and this dataset was also part of catalog A, then the link between dataset and A would be overwritten with catalog B? Unless the dataset has a schema:includedInDataCatalog (which has cardinality 0..n) with both catalog A and B.

If I understand you correctly, the relation dataset–catalog would become many–many.

In theory yes, but I wonder if we should add triples. Why not just make files (easier to process by harvesters/aggregators)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants