Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to our new Python standard + fix many subtle issues #47

Merged
merged 45 commits into from
Jul 25, 2023
Merged

Conversation

benoit74
Copy link
Collaborator

@benoit74 benoit74 commented Jul 14, 2023

Rationale

Fix #46, i.e. migrate to our new Python standard + upgrade Zimscraperlib to 3.1.1

Changes

  • base Docker image is python:3.11-bookworm (upgrade to bookworm)
  • zimscraperlib is 3.1.1
  • use hatch instead of setuptools
  • introduce black / ruff / pyright (basic profile) / pre-commit
  • fix many simple formatting / language features / import ordering issues
  • fix many subtle bugs raised by ruff / pyright
  • publish on releases (instead of tags) to Docker (as before) AND to PyPi (new)
  • description is now limited to expected lenght and long description is set
  • icons and illustrations are squared as expected

@benoit74
Copy link
Collaborator Author

I'm a bit surprised that publishing to PyPi was not in Github workflow. Was it done manually previously ?
Because it is here and up-to-date: https://pypi.org/project/kolibri2zim/

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

  • Sorry about all the extra .encode() that you had to add for add_item_for(). We'll release scraperlib.
  • Same goes for all the flake8-errmsg ones.
  • Question: what's the reasoning behind your Path().as_posix() usage? Given we don't manipulate paths string outside of pathlib, it sounds a bit cumbersome.

.github/workflows/publish.yml Outdated Show resolved Hide resolved
.github/workflows/publish.yml Outdated Show resolved Hide resolved
.github/workflows/pull.yml Show resolved Hide resolved
.github/workflows/qa.yml Outdated Show resolved Hide resolved
.github/workflows/qa.yml Outdated Show resolved Hide resolved
kolibri2zim/scraper.py Outdated Show resolved Hide resolved
kolibri2zim/scraper.py Outdated Show resolved Hide resolved
kolibri2zim/scraper.py Outdated Show resolved Hide resolved
kolibri2zim/scraper.py Outdated Show resolved Hide resolved
kolibri2zim/scraper.py Outdated Show resolved Hide resolved
@benoit74
Copy link
Collaborator Author

Thank you !

Sorry about all the extra .encode() that you had to add for add_item_for(). We'll release scraperlib.

Do you have an ETA / release number I could watch? Should we wait for this scraperlib release for merge this PR, or go "as-is" and have a clean-up issue ?

Same goes for all the flake8-errmsg ones.

I'm not sure to get this one. Are you speaking about openzim/_python-bootstrap#17? Does it have something to do with scraperlib?

Question: what's the reasoning behind your Path().as_posix() usage? Given we don't manipulate paths string outside of pathlib, it sounds a bit cumbersome.

We need to transform the Path into a str. Instead of doing a simple str(xxx), I found .as_posix() more meaningfull. I could have used .__fspath__. WDYT?

@rgaudin
Copy link
Member

rgaudin commented Jul 18, 2023

Do you have an ETA / release number I could watch? Should we wait for this scraperlib release for merge this PR, or go "as-is" and have a clean-up issue ?

Released! 3.1.1

Same goes for all the flake8-errmsg ones.

I'm not sure to get this one. Are you speaking about openzim/_python-bootstrap#17? Does it have something to do with scraperlib?

Yes, nothing to do with scraperlib.

Question: what's the reasoning behind your Path().as_posix() usage? Given we don't manipulate paths string outside of pathlib, it sounds a bit cumbersome.

We need to transform the Path into a str. Instead of doing a simple str(xxx), I found .as_posix() more meaningfull. I could have used .__fspath__. WDYT?

It is more meaningful! It implies some posix requirement which is not exact. I think str() is fine

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you ;

  • Dockerfile LABEL is incorrect (repo name)
  • User-set description should be checked for length and refused if not compliant.
  • User should also be able to set LongDescription (checked as well).
  • Description should be user-defined first then from channel (stripped if over limit)
  • LongDescription should only be set if user-defined or if description is longer than limit. Otherwise, not set at all.

.github/workflows/publish.yml Outdated Show resolved Hide resolved
Dockerfile Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
kolibri2zim/scraper.py Outdated Show resolved Hide resolved
kolibri2zim/scraper.py Outdated Show resolved Hide resolved
kolibri2zim/scraper.py Outdated Show resolved Hide resolved
kolibri2zim/scraper.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Jul 24, 2023

Codecov Report

❗ No coverage uploaded for pull request base (main@3adfca2). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@           Coverage Diff           @@
##             main      #47   +/-   ##
=======================================
  Coverage        ?   24.90%           
=======================================
  Files           ?        7           
  Lines           ?      775           
  Branches        ?      135           
=======================================
  Hits            ?      193           
  Misses          ?      570           
  Partials        ?       12           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benoit74
Copy link
Collaborator Author

I've upgraded to bookworm instead of bullseye. I built the Docker image + perform a run locally, every ran fine (on only few nodes of a channel, but still pretty good sign)

@benoit74
Copy link
Collaborator Author

Could you have a look at all unresolved conversations as well, I think we are ok but would like to have your confirmation before marking as resolved.

@benoit74 benoit74 requested a review from rgaudin July 24, 2023 16:37
Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About ready

.github/workflows/publish.yml Outdated Show resolved Hide resolved
hatch_build.py Outdated Show resolved Hide resolved
src/kolibri2zim/scraper.py Outdated Show resolved Hide resolved
hatch_build.py Outdated Show resolved Hide resolved
kolibri2zim/scraper.py Outdated Show resolved Hide resolved
kolibri2zim/scraper.py Outdated Show resolved Hide resolved
@benoit74
Copy link
Collaborator Author

I think I finally understood your point regarding increasing the stack and retaining objects. I reverted multiple changes around this, adding few # pyright: ignore or sensible defaults so that the type checker is ok.

Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great ; you can change the datetime thing and rule and merge 👍

pyproject.toml Outdated Show resolved Hide resolved
src/kolibri2zim/scraper.py Outdated Show resolved Hide resolved
@benoit74 benoit74 merged commit 9448d0b into main Jul 25, 2023
6 checks passed
@benoit74 benoit74 deleted the issue_46 branch July 25, 2023 12:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Adapt the scraper to new rules
2 participants