Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irregular use of distributions #1033

Open
EnnoMeijers opened this issue Nov 13, 2024 · 4 comments
Open

irregular use of distributions #1033

EnnoMeijers opened this issue Nov 13, 2024 · 4 comments

Comments

@EnnoMeijers
Copy link

The Globalise project publishes a dataset (https://datasetregister.netwerkdigitaalerfgoed.nl/show.php?lang=nl&uri=https%3A%2F%2Fhdl.handle.net%2F10622%2FLVXSBW) that contains a very large amoint of distributions (>6800?). This seems unbalanced with the regular use of distributions and might lead to problems for applications reusing the datasetregister data. It feels like the dataset is described on a too granular level. Is this type of use aligned with the intentions of the DCAT specification or our intented use?

@coret
Copy link
Contributor

coret commented Nov 13, 2024

@LvanWissen I think the transcription dataset is better usable if it is provided in one compressed file?

@LvanWissen
Copy link

But that's not how Dataverse publishes the data (and can ingest data, there is a file size limit).

@coret
Copy link
Contributor

coret commented Nov 15, 2024

@LvanWissen https://support.dataverse.nl/support/solutions/articles/80001022346-upload-of-large-files-a-lot-of-files-in-dataversenl states the "double zip" option, it this a possibility for the transcriptions?

@LvanWissen
Copy link

LvanWissen commented Nov 15, 2024

Right now, it's very easy for a user to download the PageXML of a single inventory number. With that goal in mind, we already double zipped this data so at least the PageXML for a single inventory number is packed together.

We could have opted for a grouping until the maximum upload size of this Dataverse would have been reached: 2GB, but that would have hindered the usability. And this way you still end up with a couple of files/distributions.

All possible, but for what goal are we making these dataset descriptions? A machine is now perfectly capable of downloading the files, while a human can easily pick a single inventory number. Can we make guidelines or a best practice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants