Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition: Support for rewe.de #1378

Open
wants to merge 7 commits into
base: main
Choose a base branch
from
Open

Addition: Support for rewe.de #1378

wants to merge 7 commits into from

Conversation

Gamekohl
Copy link

@Gamekohl Gamekohl commented Nov 13, 2024

Resolves #1379

@jayaddison
Copy link
Collaborator

Thanks @Gamekohl! Could you run the scripts/reorder_json_keys.py script to re-order the JSON data to our standard format?

@Gamekohl
Copy link
Author

@jayaddison done!

recipe_scrapers/rewe.py Outdated Show resolved Hide resolved
recipe_scrapers/rewe.py Outdated Show resolved Hide resolved
@jayaddison
Copy link
Collaborator

One note about this scraper: it does work correctly when scraping from the recipe site's HTML -- but when I try using the online mode in this library, I get errors from the scraper (probably an HTTP error response from the server). That's OK, but just something to keep in mind.

@jayaddison
Copy link
Collaborator

@Gamekohl thanks again. One more request: could you update the description here to 'Resolves #1379'? The original thread about rewe.de has some context regarding bot-filtering that I think we should keep open (for other recipe sites, where bot filtering is known to occur, we have an open issue tagged with the bots-protection label). So I've filed a separate issue to track this (maybe redundant -- but I don't think we should entirely close the original).

@@ -0,0 +1,39 @@
{
"author": null,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm; some kind of bug in our SchemaOrg.author method, possibly. I think an author name should be appearing here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rewe.de/rezepte recipes don't have an author

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, yep. The author is listed in the WebPage metadata.. but for some reason we're not retrieving it from there. It's not a bug in your code; I'm going to spend some time investigating that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I've tracked down the problem in #1380. Not exactly sure how best to resolve it yet; multiple options available.

recipe_scrapers/rewe.py Outdated Show resolved Hide resolved
recipe_scrapers/rewe.py Show resolved Hide resolved
tests/test_data/rewe.de/rewe.json Outdated Show resolved Hide resolved
tests/test_data/rewe.de/rewe.json Show resolved Hide resolved
recipe_scrapers/rewe.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Tracking issue: add support for rewe.de
3 participants