Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New ZIM request: NHS conditions #323

Open
dattaz opened this issue Feb 2, 2021 · 19 comments
Open

New ZIM request: NHS conditions #323

dattaz opened this issue Feb 2, 2021 · 19 comments
Assignees
Labels
Bug Something isn't working Medical Medical related Content Upstream For tickets which are waiting for an upstream modification (typically scrapper or target website) Zimit

Comments

@dattaz
Copy link

dattaz commented Feb 2, 2021

@Popolechien
Copy link
Collaborator

Excellent idea. It looks like a pretty straightforward design, have you tried it on zimit?

@dattaz
Copy link
Author

dattaz commented Feb 3, 2021

running it through youzim.it seems to do a great job 👏 (maybe just hitting the 1000 file limit).

@RavanJAltaie RavanJAltaie added Medical Medical related Content Zimit labels Jul 6, 2023
@RavanJAltaie
Copy link
Contributor

Recipe created
https://farm.openzim.org/recipes/nhs.uk-conditions_en_all
I'll update the library link once ready

@RavanJAltaie RavanJAltaie self-assigned this Aug 15, 2024
@RavanJAltaie
Copy link
Contributor

File is ready at the library
https://library.kiwix.org/viewer#nhs.uk-conditions_en_all_2024-08

@benoit74
Copy link
Contributor

benoit74 commented Sep 2, 2024

Same CSS fix should be applied as in #1138

@benoit74 benoit74 assigned benoit74 and unassigned RavanJAltaie Sep 2, 2024
@benoit74
Copy link
Contributor

Custom CSS created, recipe updated to publish to dev with this custom CSS and requested, let's see.

@Jaifroid
Copy link
Collaborator

Jaifroid commented Sep 17, 2024

@benoit74 I've just noticed with this ZIM (I'm testing for the first time, having been away) that none of the videos appear to work. See for example the Heart Attack video at bottom of this page: https://library.kiwix.org/viewer#nhs.uk-conditions_en_all_2024-09/www.nhs.uk/conditions/heart-attack/ . There are other examples such as the Menstrual Cycle video at the bottom of this page: https://library.kiwix.org/viewer#nhs.uk-conditions_en_all_2024-09/www.nhs.uk/conditions/periods/ .

Clearly this is Zimit-related, and not specific to this ZIM, but I thought I should note it here.

EDIT: I tested in library.kiwix.org and in the PWA. Videos don't play in either.

@benoit74
Copy link
Contributor

I'm testing for the first time, having been away

For the record, you published this file to production on August 15, you probably already tested it or at least you should have.

The fact that videos don't work is is a known limitation of the scraper. Only Youtube videos are known to work in Zimit/Warc2zim, and this is not going to change in the coming months / years.

Is it critical enough that we remove the ZIM for production? Or the information present is sufficiently valuable without videos?

@Jaifroid
Copy link
Collaborator

I'm testing for the first time, having been away

For the record, you published this file to production on August 15, you probably already tested it or at least you should have.

The fact that videos don't work is is a known limitation of the scraper. Only Youtube videos are known to work in Zimit/Warc2zim, and this is not going to change in the coming months / years.

Is it critical enough that we remove the ZIM for production? Or the information present is sufficiently valuable without videos?

Hi @benoit74 I think you think you're replying to a different person! (I am not involved in publishing ZIMs.). The decision on whether it's critical is more for your team to decide, but personally I'd say it's not critical because there is a lot of textual information. I don't know whether the underlying video files have been scraped, but if they have, then it bloats the ZIM if they can't be accessed, and it might be an idea to exclude them.

@benoit74
Copy link
Contributor

Sorry @Jaifroid, too soon in the morning, I was convinced it was Ravan speaking ^^

Your point regarding whether videos are bloating the ZIM is indeed a good one

@benoit74
Copy link
Contributor

I confirm the ZIM is bloated with first seconds of every videos. Unfortunately I don't think we have sufficient tooling to exclude them from the ZIM, AFAIK we can do it only with openzim/zimit#353.

I think it would be super cool if we could also replace or even watermark video posters in such situation so that we have something saying "videos not available in ZIM". I've opened openzim/warc2zim#396 to keep the idea.

I've also opened openzim/warc2zim#397 for a "let's dream a bit" scenario.

Regarding current NHS conditions ZIM and until these issues are solved, should we manually remove the useless items and publish it manually? It is work only a developer can do, but if we agree that we will not update the ZIM for coming year this might be worth it to avoid big ZIM for nothing.

@RavanJAltaie RavanJAltaie added the Bug Something isn't working label Sep 19, 2024
@Popolechien
Copy link
Collaborator

I was going to ask how much bloated is bloated but considering that NHS conditions is 4.5GB and NHS medicine is 13.5MB, I suspect I have an answer.
@benoit74 can you please remove these unviewable videos?

@benoit74
Copy link
Contributor

can you please remove these unviewable videos?

Do we agree this is a one-shot manual operation, and I will not do it again until many months (i.e. the recipe will be disable?)

We have no tooling for this, so I will have to do it "by hand", quite time consuming.

@Jaifroid
Copy link
Collaborator

@benoit74 Personally (but I guess it's @Popolechien's call), I'd say it is not something you should have to do "by hand", but rather something that could wait till openzim/zimit#353 is ready and it can be done automatically. I don't think it's so urgent as to take up valuable time that could be spent on other things. Sorry if I'm speaking (writing) out of turn! JMHO.

@Popolechien
Copy link
Collaborator

We have no tooling for this, so I will have to do it "by hand"

Ah no, I thought that your hands would be writing a handy script and voilà. Never mind, then. Let's wait for openzim/zimit/issues/353 as flagged by @Jaifroid

@benoit74
Copy link
Contributor

Then we have to remove the file from production, right? If so, then please open a separate issue since the assignees are different.

@Popolechien
Copy link
Collaborator

Yup. Opened #1163

@Popolechien
Copy link
Collaborator

Wait - what's the policy again here? Keep it open as it's not ready, or close it because the recipe exists?

@benoit74
Copy link
Contributor

Never close unless we know we will never make the ZIM. Here we have good hopes to do the ZIM, so only flag it as upstream + bug.

@benoit74 benoit74 added the Upstream For tickets which are waiting for an upstream modification (typically scrapper or target website) label Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Medical Medical related Content Upstream For tickets which are waiting for an upstream modification (typically scrapper or target website) Zimit
Projects
None yet
Development

No branches or pull requests

5 participants