Correct Language codes in Gutenberg recipes #217

RavanJAltaie · 2024-02-22T14:47:43Z

For Gutenberg, we use the "one-language-one-zim" mode in Zimfarm. In this mode, the language is set automatically by the scraper. Obviously the scraper is creating ZIMs with improper language => open upstream issue in Gutenberg scraper, nothing you can solve yourself.

there are two issues:

openZIM:gutenberg_mul_all is improper ZIM name, mul language is not a valid ISO-639-3 language code
openZIM:gutenberg_rmr_all is improper ZIM name, rmr language is not a valid ISO-639-3 language code anymore ; as of 2010-01-18, [rmr] for Caló is deprecated due to split. split into Caló [rmq] and Erromintxela [emx]

Edit:

openZIM:gutenberg_mul_all:
- ZIM name is OK
- ZIm filename is OK
- ZIM language is KO because mul language is not a valid ISO-639-3 language code, it must be a csv list of ISO-639-3 sorted by importance (so number of entries here)
openZIM:gutenberg_rmr_all:
- rmr language is not a valid ISO-639-3 language code anymore ; as of 2010-01-18, [rmr] for Caló is deprecated due to split. split into Caló [rmq] and Erromintxela [emx]
- ZIM name must be updated (to rmq probably)
- ZIM filename also
- ZIM language must be updated as well, could be rmq or rmq,emx
- might be solved upstream (Gutenberg)

The text was updated successfully, but these errors were encountered:

eshellman · 2024-02-22T14:49:49Z

I can see about Caló (it's only one book) from upstream, but none of the others are language codes from PG, that I know of.

benoit74 · 2024-02-22T14:55:52Z

Thank you @eshellman, if you could fix rmr upstream it would be great ; otherwise we would have to add a "hack" to our scraper to transform rmr into rmq,emx since it's probably the real situation, or maybe only rmq

mul is a hack for the ZIM we create with all languages. The scraper should not do that to respect openZIM specification, and list all languages. This part is for us ^^

rgaudin · 2024-02-22T15:36:58Z

@benoit74 Languages metadata must be a list of ISO-639-3 sorted by importance (so number of entries here) but the Name metadata and the filename will keep the mul.

benoit74 · 2024-02-22T15:47:09Z

Languages metadata must be a list of ISO-639-3 sorted by importance (so number of entries here) but the Name metadata and the filename will keep the mul.

Yep, I had this in mind. Thank you for confirming before I even asked 😄

benoit74 · 2024-02-22T15:47:49Z

(and sorry for the wrong description in first comment, I wrote it too fast)

RavanJAltaie mentioned this issue Feb 22, 2024

Incorrect ISO-639-3 language codes openzim/zim-requests#822

Closed

benoit74 added the bug label Feb 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correct Language codes in Gutenberg recipes #217

Correct Language codes in Gutenberg recipes #217

RavanJAltaie commented Feb 22, 2024 •

edited by benoit74

Loading

eshellman commented Feb 22, 2024

benoit74 commented Feb 22, 2024

rgaudin commented Feb 22, 2024

benoit74 commented Feb 22, 2024

benoit74 commented Feb 22, 2024

Correct Language codes in Gutenberg recipes #217

Correct Language codes in Gutenberg recipes #217

Comments

RavanJAltaie commented Feb 22, 2024 • edited by benoit74 Loading

eshellman commented Feb 22, 2024

benoit74 commented Feb 22, 2024

rgaudin commented Feb 22, 2024

benoit74 commented Feb 22, 2024

benoit74 commented Feb 22, 2024

RavanJAltaie commented Feb 22, 2024 •

edited by benoit74

Loading