Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support_description_long_description #183

Closed
wants to merge 3 commits into from

Conversation

joe-rabbit
Copy link

@joe-rabbit joe-rabbit commented Jan 4, 2024

Fix #181

I have added a function called handle_descriptions that takes in three inputs: default_description, description, and long_description. It returns the description either as long_description or description. I have also made changes in the get_zim_info function.

Additionally, the newer version of make_zim_file does not take favicon as an input parameter, so I changed it to illustrations.

Copy link
Collaborator

@benoit74 benoit74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this first version. Some modifications are necessary.

Please use the "Fix ####" format in your first comment to link this PR to the original issue (see https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests)

Please also not that it might not be that clear in the original issue but we need to check for validity of provided description and long description as soon as possible. Same for other ZIM parameters (e.g. illustation). Here we check after get_content has been called, which is an issue because it means the scraper can spend hours to retrieve the whole content and only then fail because the description is too long (for instance). You need to find a way to change this and check that ZIM metadata (illustration, description, title, long_description, ...) are valid. Basically a call to validate_metadata must be made asap to check everything is ok before downloading all content.

@@ -831,6 +842,24 @@ def render(self):
self.build_dir.joinpath("assets"),
)


def handle_descriptions(self, default_description, description=None, long_description=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have handled this change

openedx2zim/scraper.py Outdated Show resolved Hide resolved
openedx2zim/scraper.py Outdated Show resolved Hide resolved
openedx2zim/scraper.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
@joe-rabbit
Copy link
Author

okay sure will make the necessary changes :)

@joe-rabbit
Copy link
Author

joe-rabbit commented Jan 8, 2024

@benoit74 ,Regarding the check, before downloading the zim file may i add the check before prepare_mooc_data() to check the ZIM_MetaData and call it again after the favicon gets downloaded ???

@benoit74
Copy link
Collaborator

benoit74 commented Jan 9, 2024

@benoit74 ,Regarding the check, before downloading the zim file may i add the check before prepare_mooc_data() to check the ZIM_MetaData and call it again after the favicon gets downloaded ???

This check must be done ASAP.

To be honest, I think you shouldn't spend too much time on this issue, the scraper is not working anymore and complex, you won't be able to really test your change. Taking an issue from youtube / ted / freecodecamp / kolibri / zimfarm would be more appropriate.

@benoit74
Copy link
Collaborator

benoit74 commented Feb 1, 2024

Closing this for now, feel free to reopen

@benoit74 benoit74 closed this Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Upgrade python-scraperlib to 3.x, including CLI support for description / long_description flags
2 participants