Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simultaneously use ALTO and PAGE XML in a dataset? #60

Open
alix-tz opened this issue Feb 24, 2022 · 2 comments
Open

Simultaneously use ALTO and PAGE XML in a dataset? #60

alix-tz opened this issue Feb 24, 2022 · 2 comments
Labels
question Further information is requested

Comments

@alix-tz
Copy link
Member

alix-tz commented Feb 24, 2022

I might consider doing this with LECTAUREP, but I wonder what would be the best approach and how this would impact documenting the volumes and the dataset.

For example, I could do 2 different folders (/data/alto and /data/page) but then how would I declare the format in htr-united.yml, and will it be possible to refine the volumes of files for each XM format (like files.alto = 100 and files.page = 100 in stead of files = 200)?

Other options could include:

  • creating two different repositories (ex: lectaurep-bronod-alto and lectaurep-bronod-page) but then, on top of doubling the actions to maintain the dataset, wouldn't it artificially expand the number of datasets in HTR-United's catalog?
  • documenting only one of the two formats (but then it means that if a user is looking for dataset in the format I didn't document, they would miss my dataset)

I can't find any of these options really satisfaying. @PonteIneptique, do you have any opinion?

@alix-tz alix-tz added the question Further information is requested label Feb 24, 2022
@PonteIneptique
Copy link
Member

I can't find any of these options really satisfaying. @PonteIneptique, do you have any opinion?

My opinion would really be "Don't do it"...

@PonteIneptique
Copy link
Member

I'll answer to the first part later :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants