Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More changes for mindtouch scraper #208

Merged
merged 3 commits into from
Oct 25, 2024
Merged

More changes for mindtouch scraper #208

merged 3 commits into from
Oct 25, 2024

Conversation

benoit74
Copy link
Collaborator

@benoit74 benoit74 commented Oct 24, 2024

Changes:

  • fix repr of ZimPath / HttpUrl which where too long and not working
  • add support for no pre_head_insert in html rewriter and no js_notify callback in all rewriters
  • instead of build items_to_download in the rewriter, return complete url rewrite information so that 'users' (e.g. mindtouch scraper) can decide what they want to do

Nota:

  • this has been tested with upcoming changes for mindtouch scraper and is supposed to be feature-complete. In last (and biggest) change, the signature of rewriters are not significantly modified so it is not expected to cause significant changes in warc2zim once we will start Move to zimscraperlib content rewriting warc2zim#411

Copy link

codecov bot commented Oct 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (1c2c37e) to head (4ac7665).
Report is 4 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main      #208   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           38        38           
  Lines         2219      2221    +2     
  Branches       426       426           
=========================================
+ Hits          2219      2221    +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@benoit74 benoit74 requested a review from rgaudin October 24, 2024 15:04
@benoit74 benoit74 marked this pull request as ready for review October 24, 2024 15:13
Copy link
Member

@rgaudin rgaudin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/zimscraperlib/rewriting/html.py Show resolved Hide resolved
@benoit74 benoit74 merged commit 0cfd96b into main Oct 25, 2024
9 checks passed
@benoit74 benoit74 deleted the mindtouch_changes branch October 25, 2024 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants