Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Related Dataset #9

Open
juhahakala opened this issue Jun 29, 2021 · 9 comments
Open

Related Dataset #9

juhahakala opened this issue Jun 29, 2021 · 9 comments
Labels

Comments

@juhahakala
Copy link
Collaborator

juhahakala commented Jun 29, 2021

Proposed DCMI Metadata Terms: http://purl.org/dc/terms/relatedDataset

Label: Related Dataset

Dataset referenced in the described resource.

SRAP: Dataset referenced in the described scholarly resource.

Recommended practice is to identify the dataset with a URI identifying either the dataset or a landing page through which the dataset is accessed.

dcterms:relatedDatasethttps://doi.org/10.17605/OSF.IO/B6KJZ</dcterms:relatedDataset>

-- Discussion --

URI will usually be based on PID (such as DOI, as in the example).

DataCite DOIs resolve to the landing page which may contain URI links to 1-n manifestations of the data set. Work level citation should not be a problem in this case.

@HughP
Copy link

HughP commented Feb 2, 2023

How is this different from the dct:references property? What makes the necessity for something like supporting dataset as opposed supporting magazine? In what way is the supporting dataset defined? I mean what exactly is the bibliographic relationship given something like Tillet's list of relationships.

@juhahakala
Copy link
Collaborator Author

Related Dataset and Related Code are both subproperties of dct:references. With these properties it is possible to provide links to research datasets and applications which were essential in creation of the described scholarly resource. dct:references may be used of this purpose as well, but unlike Related Dataset and Related Code it does not reveal the nature of the linked object.

@HughP
Copy link

HughP commented Apr 5, 2023

@juhahakala,

Several comments:

  1. I understand the desire to create relationships to code and datasets which are essential to the main object of description, but why would a schema, ontology, or application profile specify them overtly? We have dct:source. dct:source can be used in conjunction with dct:references. Presumably the object of description would be adequately described with a dct:source relationship and that resource would have a DCMIType indicator. Thereby generic relationships could be used and the type of thing which is source would be identified with the DCMIType vocabulary.
  2. Why would the term Related Code be used instead of Related Software? The term Software is already in use within the DCT namespace, introducing another lexical element seems only to bring ambiguity. That is, is all software code? is all code software?
  3. The semantics of Related are different from Source in natural English. Related can have a very broad meaning. Mercurial is software which is related to Git. But there is no source relationship.
  4. There are many reasons to reference something, source relationship is only one of them. The CiTO points out others, but secondary research on CiTO points out even more citation types.

@kcoyle
Copy link
Collaborator

kcoyle commented Apr 6, 2023

I agree with Hugh. We should think about the best way to scale this type of information. There could be many different kinds of related resources, and I don't think we want to do properties for all of them. dct:references recommends non-literal values, which could them themselves be given a dct:Type class. The latter has both Dataset and Software. In that way, the type is a characteristic of the referenced entity, not the predicate. And in fact, depending on what cataloging has been done, the referenced entity may already be described with a type.

@juhahakala
Copy link
Collaborator Author

In the 32nd meeting, we decided to use dct:source for description of resources that have had an essential role in the production of the described resource. Proposed elements RelatedCode and RelatedDataSet will be dropped. Instead, dct:source will be linked to the type of the source material. This change opens two additional tasks. First, a controlled vocabulary of source materials in required. Software and dataset are obvious choices, but that may not be enough. Be that as it may, adding new terms to the SRAP source type vocabulary will be easier than adding new properties to the SRAP itself. Second, it is necessary to specify syntax for linking the source type to the source specification.

@HughP
Copy link

HughP commented Aug 7, 2023 via email

@kcoyle
Copy link
Collaborator

kcoyle commented Aug 8, 2023

@HughP It isn't intended to be a textual description - the related file will be located somewhere with a URL. Ideally, that file will be described with its own metadata, thus constituting a "description". There are a number of different existing metadata schemes that have types that we could use. What we haven't discussed is whether SRAP would define how such files might be described. I'll try to mock up an example.

I see this as different from CiTO because the intention here is that these are files that are essential parts of the scholarly work itself, and which can be "published" simultaneously with the article in digital form. I now begin to wonder if this implies a way to package the article and these supporting files together, a kind of directory that would cause them to always be retrieved together. That implies a stronger relationship than dct:source but presumably could be implemented in software.

@freddy-sumba
Copy link
Contributor

After reviewing the discussion on how to represent related datasets, I suggest using dct:relation flexibly, with a controlled vocabulary to specify the nature of the relationship. This approach allows for identifying various relationships (like related data, associated software, etc.) clearly and efficiently.

Example:

Imagine we have Dataset A used to develop a machine learning model in a study. Dataset B is a related dataset generated as an outcome of the study. This relationship could be represented as:

http://example.org/datasetA
dct:relation [
a dct:Type ;
dct:identifier http://example.org/datasetB ;
dct:description "Dataset B generated as an outcome of the analysis of Dataset A." ;
] .

In this example, dct:Type would be part of a controlled vocabulary that specifies Dataset B as a direct outcome of Dataset A, providing clarity about the relationship between the two datasets and enhancing interoperability.

@osma
Copy link
Collaborator

osma commented Sep 10, 2024

The new dct:relation now supports this, if you use it to point to a SRAPResource that is a dataset (and provide a COAR Type to indicate that it's a dataset).

We just need better guidance in the SRAP specification on how to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants